CN112418303A

CN112418303A - Training method and device for recognizing state model and computer equipment

Info

Publication number: CN112418303A
Application number: CN202011307328.XA
Authority: CN
Inventors: 杜治江; 王耀农; 余言勋; 张震; 刘智辉; 肖钟雯
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2021-02-26

Abstract

The invention discloses a training method and a training device for identifying a state model and computer equipment, which are used for solving the technical problem that the existing model has low accuracy in inter-class data detection. The method comprises the following steps: determining a first sample data set and a second sample data set to be trained; performing key point positioning processing on each image in the first sample data set and the second sample data set to obtain a processed first sample data set and a processed second sample data set; performing one-to-one corresponding fusion processing on the key points in each image in the processed first sample data set and the key points in each image in the processed second sample data set to obtain a fusion sample data set; training a preset model based on the fusion sample set, the first sample data set and the second sample data set to obtain a trained recognition state model, and detecting the state of the trunk or the car door in the frame image to be detected through the trained recognition state model.

Description

Training method and device for recognizing state model and computer equipment

Technical Field

The invention relates to the technical field of computers, in particular to a training method and a training device for identifying a state model and computer equipment.

Background

At present, data augmentation is one of the skills commonly used in deep learning, and is mainly used for increasing a training data set and diversifying data as much as possible, so that a trained model has stronger generalization capability, samples of the data are augmented through the data augmentation, the samples are relatively balanced, and the accuracy of the trained model is improved.

However, similar to material with the trunk or doors of the vehicle in an open position, the single sample size can only be expanded by traditional forms of data augmentation because of the difficulty of collection. The method of the augmentation is the intra-class augmentation, that is, if the material image is an image with a trunk or a door open, all the augmented images are images with the trunk or the door open.

In such a way, there is no processing way in any form for the inter-class data, so that the accuracy of the inter-class data detection based on the training model with intra-class augmentation is low.

Disclosure of Invention

The invention discloses a training method and a training device for identifying a state model and computer equipment, which are used for solving the technical problem that the existing model has low accuracy in inter-class data detection.

According to a first aspect of the present invention, there is provided a training method of recognizing a state model, the method comprising:

determining a first sample data set and a second sample data set to be trained, wherein the first sample data set comprises a plurality of images for representing that a vehicle trunk or a vehicle door is in a fully opened state, and the second sample data set comprises a plurality of images for representing that the vehicle trunk or the vehicle door is in a fully closed state;

performing key point positioning processing on each image in the first sample data set and the second sample data set to obtain a processed first sample data set and a processed second sample data set; and the number of the first and second groups,

performing one-to-one corresponding fusion processing on the key points in each image in the processed first sample data set and the key points in each image in the processed second sample data set to obtain a fusion sample data set, wherein the fusion sample data set comprises a plurality of images for representing that a vehicle trunk or a vehicle door is in a fully opened state or a fully closed state or a state between the fully closed state and the fully opened state;

training a preset model based on the fusion sample set, the first sample data set and the second sample data set to obtain a trained recognition state model, and detecting the state of a trunk or a vehicle door in a frame image to be detected through the trained recognition state model.

In a possible implementation, performing a keypoint localization process on each image in the first sample data set and the second sample data set to obtain a processed first sample data set and a processed second sample data set includes:

determining the positions of key feature points included in each image in the first sample data set and the second sample data set, and screening key points based on a preset rule, wherein the preset rule is that the key points are included in images acquired when a vehicle is shot from a first preset angle or direction and/or a vehicle is shot in front of the first preset angle or direction;

and labeling the key points in each image in the first sample data set and the second sample data set to obtain the processed first sample data set and the processed second sample data set.

In a possible implementation manner, performing one-to-one fusion processing on the keypoints in each image in the processed first sample data set and the keypoints in each image in the processed second sample data set to obtain a fused sample data set, includes:

determining key points in a first image in the processed first sample data set, and determining key points in a second image in the processed second sample data set;

adding a first processing value corresponding to the abscissa of the key point in the first image and a second processing value corresponding to the abscissa of the key point in the second image, which is the same as the abscissa, to obtain the abscissa of the key point in the fused image; and the number of the first and second groups,

adding a third processing value corresponding to the ordinate of the key point in the first image and a fourth processing value corresponding to the ordinate of the key point in the second image, which is the same as the ordinate, to obtain the ordinate of the key point in the fused image;

fusing color numerical value information corresponding to the key points in the first image with color numerical value information corresponding to the key points in the second image to determine the color numerical value information of the key points in the fused image;

and determining the positions of the key points in the fused image based on the abscissa and the ordinate of the key points in the fused image, and obtaining a corresponding fused image based on the color numerical value information of the key points in the fused image and the positions of the key points to obtain a fused sample data set.

In a possible implementation manner, training a preset model based on the fused sample set, the first sample data set, and the second sample data set to obtain a trained recognition state model includes:

inputting the fusion sample set, the first sample data set and the second sample data set into a preset model for training to obtain a plurality of output results; wherein the output results are obtained by training for a plurality of times;

determining an overall loss function, wherein the overall loss function is obtained by performing weighted calculation on a first loss function determined by performing state identification processing on the first sample data set and a second loss function determined by performing the state identification processing on the second sample data set;

and training the preset model based on the output results and the overall loss function, and if the value corresponding to the overall loss function is smaller than a preset threshold value, determining that the trained recognition state model is converged to obtain the trained recognition state model.

In one possible embodiment, determining an overall loss function includes:

the global loss function is calculated by:

loss＝λ*loss1+(1-λ)loss2；

wherein, loss is used for representing the whole loss function, loss1 is used for representing the first loss function, loss2 is used for representing the second loss function, and lambda is used for representing the weight value.

According to a second aspect of the present invention, there is provided a training apparatus for recognizing a state model, the apparatus comprising:

the device comprises a determining unit, a judging unit and a training unit, wherein the determining unit is used for determining a first sample data set and a second sample data set to be trained, the first sample data set comprises a plurality of images for representing that a vehicle trunk or a vehicle door is in a fully-opened state, and the second sample data set comprises a plurality of images for representing that the vehicle trunk or the vehicle door is in a fully-closed state;

the processing unit is used for performing key point positioning processing on each image in the first sample data set and the second sample data set to obtain a processed first sample data set and a processed second sample data set; and the number of the first and second groups,

the processing unit is further configured to perform one-to-one corresponding fusion processing on the key points in each image in the processed first sample data set and the key points in each image in the processed second sample data set to obtain a fusion sample data set, where the fusion sample data set includes a plurality of images used for representing that a vehicle trunk or a vehicle door is in a fully opened state or a fully closed state or a state between the fully closed state and the fully opened state;

and the obtaining unit is used for training a preset model based on the fusion sample set, the first sample data set and the second sample data set to obtain a trained recognition state model so as to detect the state of the trunk or the door in the frame image to be detected through the trained recognition state model.

In a possible implementation, the processing unit is further configured to:

determining the positions of key feature points included in each image in the first sample data set and the second sample data set, and screening key points based on a preset rule, wherein the preset rule is that the key points are included in images acquired when a vehicle is shot from a first preset angle or direction;

In a possible implementation, the processing unit is further configured to:

In a possible implementation, the obtaining unit is further configured to:

the global loss function is calculated by:

loss＝λ*loss1+(1-λ)loss2；

According to a third aspect of embodiments of the present invention, there is provided a computer apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the first aspect of the embodiments of the present invention described above and any of the methods referred to in the first aspect.

According to a fourth aspect of embodiments of the present invention, there is provided a storage medium, wherein instructions of the storage medium, when executed by a processor of a computer device, enable the computer device to perform the first aspect of embodiments of the present invention described above and any of the methods that the first aspect may relate to.

According to a fifth aspect of embodiments of the present invention, there is provided a computer program product, which, when run on a computer device, causes the computer device to perform a method of implementing any one of the above-mentioned first aspect and first aspect of embodiments of the present invention.

The technical scheme provided by the embodiment of the invention at least has the following beneficial effects:

in an embodiment of the present invention, a first sample data set and a second sample data set to be trained may be determined, where the first sample data set includes a plurality of images for characterizing that a trunk or a door of a vehicle is in a fully open state, and the second sample data set includes a plurality of images for characterizing that the trunk or the door of the vehicle is in a fully closed state. Then, key point positioning processing can be performed on each image in the first sample data set and the second sample data set, so that the processed first sample data set and the processed second sample data set are obtained. Further, performing one-to-one corresponding fusion processing on the key points in each image in the processed first sample data set and the key points in each image in the processed second sample data set to obtain a fusion sample data set, wherein the fusion sample data set comprises a plurality of images for representing that a vehicle trunk or a vehicle door is in a fully opened state or a fully closed state or a state between the fully closed state and the fully opened state; and then, training a preset model based on the fusion sample set, the first sample data set and the second sample data set to obtain a trained recognition state model, so as to detect the state of the trunk or the door in the frame image to be detected through the trained recognition state model.

As can be seen, in the embodiment of the present invention, the key point positioning processing may be performed on the first sample data set and the second sample data set, and then the fusion processing is performed based on the processed first sample data set and the processed second sample data set to obtain a fusion sample data set, that is, the inter-class data set corresponding to the first sample data set and the second sample data set is obtained. Further, the preset model may be trained based on the inter-class data set and the first sample data set and the second sample data set, so as to obtain a trained recognition state model. In such a mode, inter-class data amplification can be performed based on the existing class database, and online amplification of sample data to be trained is realized, so that the accuracy of identification of the trained identification state model is improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention and are not to be construed as limiting the invention.

FIG. 1 is a schematic diagram of an existing intra-class augmentation of data;

FIG. 2 is a schematic diagram of an application scenario in an embodiment of the present invention;

FIG. 3 is a flowchart of a training method for recognizing a state model according to an embodiment of the present invention;

FIG. 4 is a block diagram of a training apparatus for recognizing a state model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a computer device in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. The embodiments and features of the embodiments of the present invention may be arbitrarily combined with each other without conflict. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

The terms "first" and "second" in the description and claims of the present invention and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the term "comprises" and any variations thereof, which are intended to cover non-exclusive protection. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

As mentioned above, the data augmentation methods commonly used at present include: horizontal/vertical flip, rotation, zooming, clipping, translation, color dithering, noise and the like, it can be seen that the existing augmentation mode is generally intra-class augmentation, for example, a picture with a trunk in a fully open state is included in the picture, and all the pictures after augmentation are in the fully open state.

Specifically, in the data class expansion currently performed, as shown in fig. 1, different colors respectively represent different types of distributions, each point gathered together represents one type of data, and the data expansion is that more points of a certain type are provided, that is, more data in a class. However, as is apparent from fig. 1, each data category is separated from each other, and if different types are to be effectively distinguished, all that is needed is to expand the inter-class distance and shorten the intra-class distance, but currently, there is no data sample between classes, so that the accuracy of the inter-class data detection by the training model based on the intra-class expansion is low.

In view of this, the present invention provides a training method for recognizing a state model, by which samples between classes can be increased, so that the recognition accuracy of a model trained based on the increased samples and the original samples is higher.

After the design concept of the embodiment of the present invention is introduced, some simple descriptions are made below on application scenarios to which the technical solution in the embodiment of the present invention is applicable, and it should be noted that the application scenarios described in the embodiment of the present invention are for more clearly describing the technical solution in the embodiment of the present invention, and do not form a limitation on the technical solution provided in the embodiment of the present invention.

In an embodiment of the present invention, please refer to an application scenario diagram shown in fig. 2, where fig. 2 includes two parts, namely a processing device and a computer device, it should be noted that fig. 2 only illustrates an example in which one processing device and one computer device interact with each other, and in a specific implementation process, a plurality of processing devices may interact with one computer device, or a plurality of processing devices may interact with a plurality of computer devices. It should be noted that the foregoing application scenarios are merely illustrative for the convenience of understanding the spirit and principles of the present invention, and the embodiments of the present invention are not limited in this respect. Rather, embodiments of the present invention may be applied to any scenario where applicable.

In particular implementations, the processing device and the computer device may be communicatively coupled via one or more networks. The network may be a wired network or a WIreless network, for example, the WIreless network may be a mobile cellular network, or may be a WIreless-Fidelity (WIFI) network, or may also be other possible networks, which is not limited in this embodiment of the present invention.

In a specific implementation, the aforementioned processing device may be any device that can capture or receive an image containing a fully opened or closed state of a trunk or a door of a vehicle, such as a camera or a car recorder, where the camera may be located at a road side or a parking lot. Specifically, the processing device may obtain the images directly from each of the capturing devices, or may obtain the images correspondingly from another device or server communicatively connected to the processing device.

In the embodiment of the invention, the processing device may send the collected or received image containing the fully opened state or the closed state of the trunk or the door of the vehicle to the computer device, and then the computer device performs data amplification processing on the received image, and trains the preset model based on the amplified data to obtain the trained recognition state model, so as to detect the state of the trunk or the door in the frame image to be detected through the trained recognition state model. It should be noted that, in order to facilitate understanding of the technical solution provided by the present invention, the technical solution provided by the present invention is described hereinafter by taking an interaction between one processing device and one computer device as an example.

To further explain the scheme of the training method for recognizing the state model provided by the embodiment of the present invention, the following detailed description is made with reference to the accompanying drawings and the detailed description. Although embodiments of the present invention provide method steps as shown in the following embodiments or figures, more or fewer steps may be included in the method based on conventional or non-inventive efforts. In steps where no necessary causal relationship exists logically, the order of execution of the steps is not limited to that provided by embodiments of the present invention. The method can be executed in sequence or in parallel according to the method shown in the embodiment or the figures when the method is executed in an actual processing procedure or a device (for example, a parallel processor or an application environment of multi-thread processing).

The training method for recognizing the state model in the embodiment of the present invention is described below with reference to a flowchart of the method shown in fig. 3, and the steps shown in fig. 3 may be executed by a computer device shown in fig. 1. In an implementation, the computer device may be a server, such as a personal computer, a midrange computer, a cluster of computers, and so forth.

Step 301: determining a first sample data set and a second sample data set to be trained, wherein the first sample data set comprises a plurality of images for representing that a trunk or a door of the vehicle is in a fully opened state, and the second sample data set comprises a plurality of images for representing that the trunk or the door of the vehicle is in a fully closed state.

In an embodiment of the present invention, the processing device may send the sample data set to the computer device, and then the computer device may determine a first sample data set and a second sample data set to be trained based on the sent sample data set.

In a particular implementation, the computer device may determine a first sample data set and a second sample data set to be trained according to a first rule. Specifically, the first rule may be to screen a first preset number of images of the vehicle trunk or the vehicle doors in a fully open state as a first sample data set, and screen a second preset number of images of the vehicle trunk or the vehicle doors in a fully closed state as a second sample data set, where the first preset number and the second preset number may be the same or different, and are not limited in the embodiment of the present invention.

In a specific implementation process, the first rule may also be that an image of the vehicle trunk or the vehicle door in a fully opened state, which is shot at a first preset angle or direction, is screened as a first sample data set, and an image of the vehicle trunk or the vehicle door in a fully closed state, which is shot at a second preset angle or direction, is screened as a second sample data set, where the first preset angle or direction and the second preset angle or direction may be the same or different, and the embodiment of the present invention is not limited. For example, the first preset direction is a side direction in which the door is located, and the second preset direction is a rear direction in which the front of the trunk is completely photographed.

Step 302: and performing key point positioning processing on each image in the first sample data set and the second sample data set to obtain a processed first sample data set and a processed second sample data set.

In the embodiment of the invention, the positions of key feature points included in each image in the first sample data set and the second sample data set are determined, and the key points are screened based on a preset rule, wherein the preset rule is that the key points are included in images acquired when a vehicle is shot from a preset angle or direction, and then the key points in each image in the first sample data set and the second sample data set can be labeled to obtain the processed first sample data set and the processed second sample data set.

In a specific implementation process, the position of a key feature point included in each image in the first sample data set and the second sample data set may be determined, and specifically, the key feature point may be a front pillar, a middle pillar, a back-up light, a tail combination light, a high-mount stop light, a back-up light, an outside rearview mirror, an outside opening handle, a door frame, and the like.

In the embodiment of the present invention, after the positions of the key feature points are determined, the key points may be screened based on a preset rule, specifically, the preset rule is that the key points are included in an image acquired when the vehicle is shot from a preset angle or direction. In a specific implementation, after the position of the key feature point is determined, the key feature point included in the image acquired when the vehicle is photographed from a preset angle or direction may be used as the key point. For example, the preset direction is a vehicle rear direction, or the preset direction is a vehicle side direction, but of course, the preset direction may also be a vehicle rear direction and a vehicle side direction.

In the embodiment of the present invention, after the key points are determined, the key points may be labeled, specifically, the key points may be labeled in a circle, a square, a rectangle, or an irregular pattern, and the processed first sample data set and the processed second sample data set are obtained, so that accurate fusion may be performed based on the labeling information during subsequent image fusion.

Step 203: and performing one-to-one corresponding fusion processing on the key points in each image in the processed first sample data set and the key points in each image in the processed second sample data set to obtain a fusion sample data set, wherein the fusion sample data set comprises a plurality of images for representing that a vehicle trunk or a vehicle door is in a fully opened state or a fully closed state or a state between the fully closed state and the fully opened state.

In the embodiment of the present invention, a keypoint in a first image in a first sample data set after processing is determined, and a keypoint in a second image in a second sample data set after processing is determined, where the first image is any image in the first sample data set, and the second image is any image in the second sample data set.

In the embodiment of the present invention, a first processed value corresponding to the abscissa of the keypoint in the first image and a second processed value corresponding to the abscissa of the keypoint in the second image having the same abscissa may be added to obtain the abscissa of the keypoint in the fused image, where the first processed value is obtained by multiplying the abscissa of the keypoint in the first image by the first weight, and the second processed value is obtained by multiplying the abscissa of the keypoint in the second image by the second weight. And adding a third processing value corresponding to the ordinate of the key point in the first image and a fourth processing value corresponding to the ordinate of the key point in the second image, which is the same as the ordinate, to obtain the ordinate of the key point in the fused image, wherein the third processing value is obtained by multiplying the ordinate of the key point in the first image by the first weight, and the second processing value is obtained by multiplying the ordinate of the key point in the second image by the second weight.

In the embodiment of the present invention, color numerical information corresponding to a key point in the first image and color numerical information corresponding to a key point in the second image may also be fused to determine the color numerical information of the key point in the fused image. Further, the positions of the key points in the fused image are determined based on the abscissa and the ordinate of the key points in the fused image, and a corresponding fused image is obtained based on the color numerical value information of the key points in the fused image and the positions of the key points, so as to obtain a fused sample data set. It should be noted that, in the embodiment of the present invention, there may be a plurality of key points in the first image and the second image, so that a plurality of key points in the fused image may be correspondingly determined.

In a specific implementation process, a one-Hot algorithm may be used to perform fusion processing on the keypoints in each image in the first sample data set and the keypoints in each image in the second sample data set. Specifically, the fusion process may be performed by using the following formula:

M＝wm_i+(1-w)m_j

N＝wn_i+(1-w)n_j

wherein m is_iAnd m_jCoordinates, n, of keypoints for characterizing any image selected from the first sample dataset_iAnd n_jAnd coordinates of key points for characterizing any image selected from the second sample data set, the coefficient for opening the trunk or doors of the vehicle is set to w, and the coefficient for closing the trunk or doors of the vehicle is set to 1-w. Specifically, the aforementioned first weight value pair is understood as w, and the second weight value pair may be understood as 1-w. Of course, other ways of determining the first weight and the second weight may also be used, and the embodiments of the present invention are not limited.

In such a way, the positions corresponding to the key points of the two images can be accurately and effectively fused, so that a fusion sample data set is obtained.

In a specific implementation process, a first image in a first sample data set, which includes a vehicle door fully opened, and a second image in a second sample data set, which includes a vehicle door fully closed and is of a different type from the vehicle in the first image, may be fused to obtain a fused image 1; the first image including the fully opened vehicle door in the first sample data set and the second image including the fully closed vehicle door in the second sample data set, which has the same vehicle type as the first image but different vehicle body color, may also be fused to obtain the fused image 2, which may of course be in other fusion manners, which is not limited in the embodiment of the present invention.

Step 304: training a preset model based on the fusion sample set, the first sample data set and the second sample data set to obtain a trained recognition state model, and detecting the state of the trunk or the car door in the frame image to be detected through the trained recognition state model.

In the embodiment of the invention, the fusion sample set, the first sample data set and the second sample data set can be input into a preset model for training to obtain a plurality of output results; wherein, a plurality of output results are obtained by training for a plurality of times; and then determining an overall loss function, wherein the overall loss function is obtained by performing weighted calculation on a first loss function determined by performing state identification processing on the first sample data set and a second loss function determined by performing state identification processing on the second sample data set.

In the embodiment of the present invention, the overall loss function is calculated by:

loss λ loss1+ (1- λ) loss 2; wherein, loss is used for representing the whole loss function, loss1 is used for representing the first loss function, loss2 is used for representing the second loss function, and lambda is used for representing the weight value.

In the embodiment of the present invention, the first loss function is a loss function determined when the preset model is used to train the images in the first sample data set, and the second loss function is a loss function determined when the preset model is used to train the images in the second sample data set. In addition, λ may be the same as the aforementioned w weight, or may be a different value, which is not limited in the embodiment of the present invention.

Further, the preset model may be trained based on the plurality of output results and the overall loss function, and if a value corresponding to the overall loss function is smaller than a predetermined threshold, it is determined that the trained recognition state model has converged, so as to obtain the trained recognition state model.

Based on the same inventive concept, the embodiment of the invention provides a training device for recognizing the state model, and the training device for recognizing the state model can realize the corresponding function of the training method for recognizing the state model. The training means for recognizing the state model may be a hardware structure, a software module, or a hardware structure plus a software module. The training device for recognizing the state model can be realized by a chip system, and the chip system can be formed by a chip and can also comprise the chip and other discrete devices. Referring to fig. 4, the training apparatus for recognizing a state model includes:

a determining unit 401, configured to determine a first sample data set and a second sample data set to be trained, where the first sample data set includes a plurality of images for representing that a vehicle trunk or a vehicle door is in a fully opened state, and the second sample data set includes a plurality of images for representing that the vehicle trunk or the vehicle door is in a fully closed state;

a processing unit 402, configured to perform key point positioning processing on each image in the first sample data set and the second sample data set, to obtain a processed first sample data set and a processed second sample data set; and the number of the first and second groups,

the processing unit 402 is further configured to perform one-to-one fusion processing on the key points in each image in the processed first sample data set and the key points in each image in the processed second sample data set to obtain a fusion sample data set, where the fusion sample data set includes a plurality of images used for representing that a vehicle trunk or a vehicle door is in a fully opened state or a fully closed state or a state between the fully closed state and the fully opened state;

an obtaining unit 403, configured to train a preset model based on the fusion sample set, the first sample data set, and the second sample data set, to obtain a trained recognition state model, so as to detect a state of a trunk or a door in a frame image to be detected through the trained recognition state model.

In a possible implementation, the processing unit 402 is further configured to:

determining the positions of key feature points included in each image in the first sample data set and the second sample data set, and screening key points based on a preset rule, wherein the preset rule is that the key points are included in images acquired when a vehicle is shot from a preset angle or direction;

In a possible implementation, the processing unit 402 is further configured to:

In a possible implementation, the obtaining unit 403 is further configured to:

the global loss function is calculated by:

loss＝λ*loss1+(1-λ)loss2；

The division of the modules in the embodiments of the present invention is schematic, and only one logical function division is provided, and in actual implementation, there may be another division manner, and in addition, each functional module in each embodiment of the present invention may be integrated in one controller, or may exist alone physically, or two or more modules are integrated in one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Based on the same inventive concept, an embodiment of the present invention provides a computer apparatus, please refer to fig. 5, where the computer apparatus includes at least one processor 501 and a memory 502 connected to the at least one processor, a specific connection medium between the processor 501 and the memory 502 is not limited in the embodiment of the present invention, in fig. 5, the processor 501 and the memory 502 are connected through a bus 500 as an example, the bus 500 is represented by a thick line in fig. 5, and a connection manner between other components is only schematically illustrated and not limited. The bus 500 may be divided into an address bus, a data bus, a control bus, etc., and is shown with only one thick line in fig. 5 for ease of illustration, but does not represent only one bus or one type of bus. In addition, a training apparatus for recognizing a state model further includes a communication interface 503 for receiving the transmitted image information.

In the embodiment of the present invention, the memory 502 stores instructions executable by the at least one processor 501, and the at least one processor 501 can execute the steps included in the training method for recognizing a state model by executing the instructions stored in the memory 502.

The processor 501 is a control center of the computer device, and can connect various parts of the whole computer device by using various interfaces and lines, and perform various functions and process data of the computer device by operating or executing instructions stored in the memory 502 and calling data stored in the memory 502, thereby performing overall monitoring on the computer device.

Optionally, the processor 501 may include one or more processing units, and the processor 501 may integrate an application processor and a modem processor, wherein the application processor mainly handles an operating system, a user interface, an application program, and the like, and the modem processor mainly handles wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 501. In some embodiments, processor 501 and memory 502 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.

The processor 501 may be a general-purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor.

Memory 502, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 502 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 502 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 502 of embodiments of the present invention may also be circuitry or any other device capable of performing a storage function to store program instructions and/or data.

By programming the processor 501, the code corresponding to the training method for identifying the state model described in the foregoing embodiment may be solidified in the chip, so that the chip can execute the steps of the training method for identifying the state model when running.

Based on the same inventive concept, embodiments of the present invention further provide a storage medium storing computer instructions, which, when executed on a computer, cause the computer to perform the steps of the training method for recognizing a state model as described above.

In some possible embodiments, the aspects of a training method for recognizing a state model provided by the present invention may also be implemented in the form of a program product including program code for causing a control computer device to perform the steps of a training method for recognizing a state model according to various exemplary embodiments of the present invention described above in this specification when the program product is run on the control computer device.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A training method for recognizing a state model, the method comprising:

2. The method of claim 1, wherein performing keypoint localization processing on each image in the first sample dataset and the second sample dataset to obtain a processed first sample dataset and a processed second sample dataset comprises:

3. The method according to claim 1 or 2, wherein performing a one-to-one correspondence fusion process on the keypoints in each image in the processed first sample data set and the keypoints in each image in the processed second sample data set to obtain a fused sample data set, includes:

4. The method of claim 3, wherein training a pre-set model based on the fused sample set, the first sample data set, and the second sample data set to obtain a trained recognition state model comprises:

5. The method of claim 4, wherein determining an overall loss function comprises:

the global loss function is calculated by:

loss＝λ*loss1+(1-λ)loss2；

6. A training apparatus for recognizing a state model, the apparatus comprising:

7. The apparatus as recited in claim 6, said processing unit to further:

8. The apparatus as recited in claim 6, said processing unit to further:

9. A computer device, characterized in that the computer device comprises: memory, processor and computer program stored on the memory and executable on the processor, which computer program, when being executed by the processor, carries out the steps of the training method of the recognition state model according to any one of claims 1 to 4.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the training method of the recognition state model according to any one of claims 1 to 4.