CN111626222A

CN111626222A - Pet detection method, device, equipment and storage medium

Info

Publication number: CN111626222A
Application number: CN202010468086.6A
Authority: CN
Inventors: 张澳; 杜天元; 王飞; 钱晨
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2020-05-28
Filing date: 2020-05-28
Publication date: 2020-09-04
Also published as: WO2021238316A1; KR20220027242A; JP2022550790A

Abstract

The present disclosure provides a pet detection method, apparatus, device and storage medium, the method comprising: acquiring an image in a vehicle cabin; detecting the image to determine whether a pet exists in the cabin; and sending first prompt information under the condition that the duration of the state that the pet exists in the vehicle cabin and the adult does not exist exceeds a first preset time.

Description

Pet detection method, device, equipment and storage medium

Technical Field

The disclosure relates to the technical field of deep learning, in particular to a pet detection method, a pet detection device, pet detection equipment and a storage medium.

Background

In recent years, pets have become an integral part of people's lives, and more users carry pets around when going out, so that the users forget the pets in the vehicle cabin. Because the car cabin is a closed space, when the weather is hot, the pets are left in the car cabin, the heat stroke, dehydration and the like of the pets can be caused, and the life safety of the pets can be endangered in serious cases.

Disclosure of Invention

In view of the above, the present disclosure provides at least a pet detection method, a pet detection device, a pet detection apparatus, and a storage medium.

In a first aspect, the present disclosure provides a pet detection method, comprising:

acquiring an image in a vehicle cabin;

detecting the image to determine whether a pet exists in the cabin;

and sending first prompt information under the condition that the duration of the state that the pet exists in the vehicle cabin and the adult does not exist exceeds a first preset time.

According to the method, the obtained image in the car cabin is detected, the first prompt message is sent out when the pet exists in the car cabin and the duration time of the state of no adult exceeds the first preset time length, so that the pet in the car cabin is detected, the life safety of the pet is prevented from being endangered after the pet is left in the car cabin for a long time, and convenience is brought to a user.

In one possible embodiment, the images of the interior of the vehicle are acquired while the vehicle is in a driving state, and the method further includes:

and sending a second prompt message when the situation that the pet exists in the vehicle cabin and the duration of the state that the pet is not in the rear seat in the vehicle cabin exceeds a second preset time length is determined.

In one possible embodiment, the detecting the image to determine whether the pet exists in the vehicle cabin is performed by a neural network, the neural network is trained by using a training sample, and the training sample is obtained by using the following steps:

acquiring a first image sample including a pet to be detected and a second image sample not including the pet to be detected;

generating a fused image sample based on the first image sample and the second image sample; wherein the fusion image sample comprises the pet to be detected and the second image sample as a background image;

determining the training sample based on the fused image sample and the first image sample.

By adopting the method, the fused image sample is generated by the first image sample including the pet to be detected and the second image sample not including the pet to be detected, the sample amount of the training sample is increased based on the fused image sample, and then the neural network training is carried out based on the sufficient sample amount, so that the accuracy of the trained neural network can be improved.

In one possible embodiment, the generating a fused image sample based on the first image sample and the second image sample includes:

extracting a contour image corresponding to the pet to be detected from the first image sample;

and fusing the second image sample and the outline image to obtain the fused image sample.

Under the above-mentioned embodiment, through the profile image of the pet that waits to detect that draws in the first image sample, fuse profile image and second image sample, can avoid the non-profile image in the first image sample to disturb the second image sample for fuse the image sample and more accord with the reality condition, improved the authenticity of fusing the image sample, and then improve the reliability of the training sample that is used for neural network training.

In one possible embodiment, the fusing the second image sample with the contour image to obtain the fused image sample includes:

for each contour image, selecting at least one target background image matching the contour image from the second image sample;

and fusing the outline image and the at least one target background image to obtain the fused image sample.

In the above embodiment, the target background image is matched with each contour image, the target background image matched with each contour image is selected from the second image sample, the image in the second image sample that is not matched with each contour image can be excluded, and distortion of the fused image sample obtained after fusion is avoided when the second image sample that is not matched with the contour image is used as the target background image, for example, the contour image of the pet to be detected in the fused image sample appears in the sky, or the contour image of the pet to be detected appears in a lake, and the like, so that the authenticity of the fused image sample can be improved.

In a possible embodiment, for each of the contour images, selecting at least one target background image matching the contour image from the second image sample includes:

respectively determining a first feature vector of each contour image and a second feature vector of each second image sample; the corresponding dimensions of the first characteristic vector and the second characteristic vector are the same;

for each contour image, determining at least one target background image matched with the contour image based on the first feature vector corresponding to the contour image and the distance between each second feature vector.

Here, by calculating the first feature vector corresponding to each contour image and the distance between the second feature vectors corresponding to the second image samples (the greater the distance between the first feature vector and the second feature vector, the greater the difference between the corresponding contour image and the second image sample, that is, the lower the matching degree), the second image samples having the larger distance between the corresponding vectors are screened out, and the remaining second image samples are used as the target background images of the corresponding contour images, so that the target background images corresponding to each contour image can be obtained more accurately.

In one possible embodiment, determining the first feature vector of each of the contour images and the second feature vector of each of the second image samples separately includes:

extracting a first original feature vector of each contour image and a second original feature vector of each second image sample;

and performing dimensionality reduction processing on the first original feature vector and the second original feature vector to obtain the first feature vector and the second feature vector with the same dimensionality.

In a possible embodiment, fusing the contour image with the at least one target background image to obtain the fused image sample includes:

for each target background image, determining a fusion position of the contour image in the target background image based on the coincidence degree of each object detection area in the target background image and the contour image; the object detection area is an image area containing any detection object;

and fusing the target background image and the outline image based on the fusion position to obtain the fusion image sample.

In the above embodiment, by determining the fusion position, it is possible to avoid a situation that the position of the pet to be detected does not conform to the natural law (for example, the pet to be detected appears in the sky, or the pet to be detected appears in a lake, etc.) in the fusion image sample obtained after fusion, so as to improve the authenticity of the fusion image sample after fusion.

In a possible implementation manner, the fusing the target background image and the contour image based on the fusion position to obtain the fused image sample includes:

placing the outline image at the fusion position of the target background image, and generating an intermediate image sample;

and performing Gaussian blur processing on the edge of the contour image in the intermediate image sample to obtain the fused image sample.

In the above embodiment, the edge image of the outline image is subjected to the gaussian blur processing, thereby further improving the reality of the fused image sample.

In one possible embodiment, the determining a training sample based on the fused image sample and the first image sample includes:

determining the number of the fused image samples based on the set proportion of the fused image samples to the first image samples and the number of the first image samples;

selecting a corresponding number of fused image samples from the obtained fused image samples based on the determined number of the fused image samples;

and determining a training sample based on the selected corresponding number of fused image samples and the first image sample.

In the above embodiment, the fusion image sample and the first image sample are configured quantitatively according to a set proportion, so that the first image sample is ensured to occupy a certain proportion while the total number of the training samples is increased, and the reliability of the training sample set can be improved to a certain extent.

The following descriptions of the effects of the apparatus, the electronic device, and the like refer to the description of the above method, and are not repeated here.

In a second aspect, the present disclosure provides a pet detection device comprising:

the acquisition module is used for acquiring images in the vehicle cabin;

the determining module is used for detecting the image and determining whether a pet exists in the cabin;

the first prompt module is used for sending out first prompt information under the condition that a pet exists in the vehicle cabin and the duration of the state without the adult exceeds a first preset duration.

In one possible embodiment, the images of the interior of the vehicle are acquired while the vehicle is in a driving state, and the device further includes:

and the second prompting module is used for sending out second prompting information under the condition that the duration of the state that the pet is not in the rear seat in the vehicle cabin exceeds a second preset duration after the pet is determined to be in the vehicle cabin.

In one possible embodiment, the detecting the image to determine whether the pet exists in the cabin is performed by a neural network, the neural network is trained by using a training sample, and the apparatus further comprises a training sample determining module for obtaining the training sample by using the following steps:

In one possible embodiment, the training sample determination module, when generating a fused image sample based on the first image sample and the second image sample, is configured to:

In a possible implementation manner, the training sample determination module, when fusing the second image sample with the contour image to obtain the fused image sample, is configured to:

In a possible embodiment, the training sample determination module, when selecting, for each of the contour images, at least one target background image matching the contour image from the second image sample, is configured to:

In one possible embodiment, the training sample determining module, when determining the first feature vector of each of the contour images and the second feature vector of each of the second image samples, is configured to:

In a possible implementation manner, the training sample determining module, when fusing the contour image and the at least one target background image to obtain the fused image sample, is configured to:

In a possible implementation manner, the training sample determination module, when fusing the target background image and the contour image based on the fusion position to obtain the fused image sample, is configured to:

In one possible embodiment, the training sample determination module, when determining the training sample based on the fused image sample and the first image sample, is configured to:

In a third aspect, the present disclosure provides an electronic device comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the pet detection method according to the first aspect or any one of the embodiments.

In a fourth aspect, the present disclosure provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the pet detection method according to the first aspect or any one of the embodiments described above.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

FIG. 1 is a flow chart of a pet detection method provided by an embodiment of the present disclosure;

FIG. 2 is a flow chart illustrating a manner of determining a training sample in a pet detection method provided by an embodiment of the disclosure;

FIG. 3 is a schematic flow chart illustrating a process of obtaining a fused image sample in a pet detection method according to an embodiment of the disclosure;

FIG. 4 is a schematic flow chart illustrating a process of selecting at least one target background image matching the contour image from a second image sample in a pet detection method provided by an embodiment of the disclosure;

FIG. 5 is a schematic flow chart illustrating a process of determining a first feature vector of each contour image and a second feature vector of each second image sample, respectively, in a pet detection method provided by an embodiment of the disclosure;

fig. 6 is a schematic flow chart illustrating a process of fusing the outline image with at least one target background image to obtain a fused image sample in the pet detection method according to the embodiment of the disclosure;

fig. 7 is a schematic flow chart illustrating a process of fusing the target background image and the contour image based on a fusion position to obtain a fusion image sample in the pet detection method according to the embodiment of the disclosure;

FIG. 8 is a schematic flow chart illustrating a process of determining a training sample based on a fused image sample and a first image sample in a pet detection method provided by an embodiment of the disclosure;

FIG. 9 is a schematic diagram illustrating an architecture of a pet detection device according to an embodiment of the present disclosure;

fig. 10 shows a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

In order to solve the problem that a user leaves a pet in a car cabin at present and damages the pet, the embodiment of the disclosure provides a pet detection method, which includes obtaining an image in the car cabin; detecting the image to determine whether a pet exists in the cabin; the pet detection method and the pet detection device have the advantages that under the condition that the duration of the state that pets exist in the car cabin and adults do not exist exceeds the first preset duration, the first prompt message is sent, the pet in the car cabin is detected, the life safety of the pet is prevented from being endangered after the pet is left in the car cabin for a long time, and convenience is brought to users.

For the convenience of understanding the embodiments of the present disclosure, a detailed description will be first provided for a pet detection method disclosed in the embodiments of the present disclosure.

Referring to fig. 1, a schematic flow chart of a pet detection method provided in the embodiment of the present disclosure is shown, and the method includes S101-S103.

And S101, acquiring an image in the vehicle cabin.

S102, detecting the image and determining whether the pet exists in the cabin.

S103, sending out first prompt information under the condition that the pet exists in the vehicle cabin and the duration of the state of no adult exceeds a first preset duration.

In the embodiment of the present disclosure, the image in the vehicle cabin may be obtained by an image pickup device provided in the vehicle cabin, for example, the image pickup device may be a Driver Monitor Status (DMS) camera. And detecting the image to determine whether a pet exists in the cabin, wherein the pet can be any animal such as a cat, a dog, a bird and the like. In specific implementation, the images can be detected through the trained neural network to determine whether the pet exists in the vehicle cabin.

Furthermore, the first prompt message may be sent to prompt the user when the pet is present in the cabin and the duration of the state in which the adult is absent exceeds a first preset duration, where the first preset duration may be set according to actual needs, for example, 1 minute, 5 minutes, 10 minutes, and the like. The prompt message may include text, sound, pictures, etc.

According to the method, the obtained image in the car cabin is detected, the first prompt message is sent out when the pet exists in the car cabin and the duration time of the state of no adult exceeds the first preset time length, so that the pet in the car cabin is detected, the life safety of the pet is prevented from being endangered after the pet is left in the car cabin for a long time, and convenience is brought to users.

In an alternative embodiment, the images in the cabin are acquired while the vehicle is in motion, and the method further comprises:

and sending a second prompt message under the condition that the duration of the state that the pet is not in the rear seat in the vehicle cabin exceeds a second preset time after the pet is determined to exist in the vehicle cabin.

Here, the position of the pet in the vehicle cabin may be detected, and if the pet exists in the vehicle cabin and the pet is in the rear seat in the vehicle cabin, the pet is in the safe position, that is, the second prompt message is not sent out. And if the duration time of the state that the pet is not in the rear seat in the vehicle cabin exceeds a second preset time length, sending out second prompt information. Wherein, the second preset time can be set according to the requirement.

In an alternative embodiment, the detection of the image to determine whether the pet is present in the cabin is performed by a neural network trained using training samples, wherein, referring to a flow chart of a method for determining the training samples in the pet detection method shown in fig. 2, the training samples are obtained by the following steps:

s201, a first image sample including a pet to be detected and a second image sample not including the pet to be detected are obtained.

S202, generating a fused image sample based on the first image sample and the second image sample; the fusion image sample comprises a pet to be detected and a second image sample serving as a background image.

S203, determining a training sample based on the fused image sample and the first image sample.

In the above steps, the fused image sample is generated by the first image sample including the pet to be detected and the second image sample not including the pet to be detected, the sample size of the training sample is increased based on the fused image sample, and then the neural network training is performed based on the sufficient sample size, so that the accuracy of the trained neural network can be improved.

S201-S203 are described in detail below.

For S201:

in the embodiment of the present disclosure, the pet to be detected may be one or more of various pet types, for example, may be one or more of a cat, a dog, a rabbit, a pig, and the like. The first image sample is an image including a pet to be detected, for example, an image including a cat and/or a dog, the second image sample is a background image not including the pet to be detected, and different second image samples may be images with different backgrounds, which may be any background that can appear in the pet to be detected. For example, the second image sample may be a bedroom image, a living room image, a kitchen image, a park image, a road image, and so forth. The number of the first image samples and the number of the second image samples are multiple, and the background types corresponding to the second image samples can be set according to actual needs.

For S202:

in the embodiment of the present disclosure, a fused image sample may be generated according to the first image sample and the second image sample. For example, the first image sample is scaled or cut and then placed on the second image sample, so that the fused image sample can include the pet to be detected, and the image in the second image sample is used as the background image of the fused image sample.

In one possible embodiment, generating a fused image sample based on the first image sample and the second image sample may include: extracting a contour image corresponding to the pet to be detected from the first image sample; and fusing the second image sample and the outline image to obtain a fused image sample.

In the embodiment of the disclosure, the contour image of the pet to be detected can be extracted from the first image sample through the trained segmentation neural network. Exemplarily, each first image sample is respectively input into the segmented neural network for processing, so as to obtain an intermediate image corresponding to each first image sample (in the intermediate image, the color of an image region corresponding to a pet to be detected is different from that of an image region corresponding to a pet not to be detected); and intercepting the outline image of the pet to be detected from the middle image. The structure of the segmented neural network can be determined according to actual needs.

Illustratively, the intermediate image may be a color image or a black-and-white image. For example, if the intermediate image is a black-and-white image, the image area corresponding to the pet to be detected in the intermediate image may be set to be black, and the image area corresponding to the pet not to be detected may be set to be white; and intercepting the outline image of the pet to be detected included in each first image sample from the intermediate image corresponding to each first image sample, namely intercepting a black area in the intermediate image to obtain the outline image corresponding to the pet to be detected included in each first image sample.

Illustratively, if the first image sample includes a plurality of pets to be detected, an intermediate image may be generated for each pet to be detected, and based on the intermediate image corresponding to each pet to be detected, the outline pattern of the pet to be detected corresponding to the intermediate image is captured. For example, if the first image sample includes 3 pets to be detected, namely, a pet a to be detected, a pet B to be detected, and a pet C to be detected, the segmentation neural network may generate an intermediate image a for the pet a to be detected, an intermediate image B for the pet B to be detected, and an intermediate image C for the pet C to be detected, respectively. Wherein, the image area corresponding to the pet A to be detected in the intermediate image A is black, and other areas (the image areas except the image area of the pet A to be detected in the intermediate image A) are white; the image area corresponding to the pet B to be detected in the intermediate image B is black, and the images corresponding to other areas are white; the image area corresponding to the pet C to be detected in the middle image C is black, and the images corresponding to other areas are white. Further, a contour image (i.e., a mask) of the pet to be detected is extracted according to the intermediate image, the contour image (i.e., a black area in the intermediate image) of the pet a to be detected is extracted from the intermediate image a, the contour image of the pet B to be detected is extracted from the intermediate image B, the contour image of the pet C to be detected is extracted from the intermediate image C, and then the contour image of the pet to be detected with a real form can be intercepted from the first image sample according to the contour image (i.e., the mask).

Illustratively, the outline image may be placed in a second image sample, resulting in a fused image sample. In the process of obtaining the fused image sample, the mask and the second image sample can be fused to obtain the fused image sample, and the outline image of the pet to be detected with the real form (namely, information such as color, texture and the like) and the second image sample can also be fused to obtain the fused image sample.

In addition, the contour image may be subjected to processing such as scaling and clipping, and then placed in the second image sample to obtain a fused image sample after fusion.

In the embodiment, the outline image of the pet to be detected in the first image sample is extracted, and the outline image is fused with the second image sample, so that the interference of the non-outline image in the first image sample on the second image sample can be avoided, the fused image sample can better accord with the actual situation, the authenticity of the fused image sample is improved, and the reliability of the training sample for neural network training is further improved.

In a possible embodiment, after extracting the contour image corresponding to the pet to be detected from the first image sample, and before fusing the second image sample with the contour image, the method may further include: firstly, carrying out transformation processing on contour images extracted from a first image sample to obtain a transformation image corresponding to each contour image; wherein the transformation process comprises: at least one of a scaling process, a clipping process, a rotation process, and a motion blur process. And secondly, taking the contour image extracted from the first image sample and the transformed image as a contour image fused with the second image sample.

In the embodiment of the disclosure, after the contour image is extracted, the contour image may be transformed, a plurality of transformed images of the pet to be detected after morphological change may be obtained, and the contour image and the transformed image extracted from the first image sample are used as the contour image for fusing the second image sample, so that the number of the contour images for fusing is increased. Illustratively, the contour image may be subjected to a transformation process by an open-cv library function. Specifically, the scaling process may be to enlarge or reduce the outline image; the intercepting process may be intercepting a local image in the contour image; the rotation process may be to rotate the contour image, or the like. The change process may also include other processes, such as a flipping process, etc.

In the above embodiment, the number of the contour images fused with the second image sample is increased by performing the transformation processing on the contour images, and thus the number of the training samples of the neural network can be further increased.

In one possible embodiment, referring to fig. 3, the fusing the second image sample with the contour image to obtain a fused image sample includes:

s301, for each contour image, selecting at least one target background image matching the contour image from the second image sample.

S302, the contour image and at least one target background image are fused to obtain a fused image sample.

In the embodiment of the present disclosure, the number of the target background images corresponding to each contour image may be set according to actual needs, for example, the number of the target background images corresponding to each contour object is set to be 1 or 3. For example, if the number of target background images is 1, for each contour image, an image that most closely matches the contour image is selected from the second image sample as the target background image corresponding to the contour image. In this case, the number of target background images corresponding to each outline image is the same. Alternatively, the target background image may be determined according to the matching degree of each second image sample corresponding to each outline image, for example, the threshold of the matching degree is set to 80%, and the second image sample with the matching degree value greater than 80% of each outline image in the second image samples is determined as the target background image of each outline image. In this case, the number of target background images corresponding to each outline image is different.

In the embodiment of the disclosure, each contour image is respectively fused with at least one matched target background image to obtain a fused image sample. For example, if the first contour image is matched to obtain 3 target background images, i.e., a first target background image, a second target background image, and a third target background image, the first contour image and the first target background image are fused, the first contour image and the second target background image are fused, and the first contour image and the third target background image are fused to obtain a fused image sample corresponding to the first contour image.

In the embodiment of the disclosure, the target background image is matched with each contour image, the target background image matched with each contour image is selected from the second image sample, the image which is not matched with each contour image in the second image sample can be excluded, and distortion of the fused image sample obtained after fusion is avoided when the second image sample which is not matched with the contour image is used as the target background image, for example, the contour image of the pet to be detected in the fused image sample appears in the sky, or the contour image of the pet to be detected appears in a lake, and the like, so that the authenticity of the fused image sample can be improved.

In a possible implementation, with reference to S301 above and shown in fig. 4, for each contour image, selecting at least one target background image matching the contour image from the second image sample includes:

s401, respectively determining a first feature vector of each contour image and a second feature vector of each second image sample; the corresponding dimensions of the first characteristic vector and the second characteristic vector are the same;

s402, for each contour image, determining at least one target background image matched with the contour image based on the first feature vector corresponding to the contour image and the distance between every two second feature vectors.

In the embodiment of the present disclosure, due to the euclidean distance or mahalanobis distance between the first feature vector and the second feature vector, the difference degree between the contour image corresponding to the first feature vector and the second image sample corresponding to the second feature vector may be characterized, that is, the smaller the euclidean distance value or mahalanobis distance value is, the smaller the difference between the contour image and the second image sample is, the higher the matching degree is; conversely, the larger the euclidean distance value or mahalanobis distance value is, the larger the difference is, and the lower the matching degree is.

Therefore, for each contour image, a first feature vector corresponding to the contour image and a second feature vector of each second image sample can be obtained, and the euclidean distance or mahalanobis distance between the obtained first feature vector and each second feature vector is calculated; and determining at least one target background image matched with the contour image according to the Euclidean distance or the Mahalanobis distance between the first feature vector and each second feature vector obtained by calculation.

Taking the euclidean distance as an example, the euclidean distance value may be from small to large, and the second image samples of the target number are selected from the plurality of second image samples as at least one target background image matched with the contour image; or, according to the set threshold of the euclidean distance, selecting a second image sample whose calculated euclidean distance value satisfies the condition from the plurality of second image samples, and determining the selected second image sample as at least one target background image matched with the contour image. For example, if the number of targets is 3, for each contour image, three second image samples having smaller euclidean distance values are selected as the target background images corresponding to the contour image, based on the euclidean distance values between the first feature vectors of the contour image and the second feature vectors of the second image samples. Or, if the target distance threshold is Y, selecting, for each contour image, a second image sample, of which the calculated euclidean distance value is smaller than the target distance threshold Y, from the plurality of second image samples as the target background image corresponding to the contour image.

Specifically, the contour image and at least one target background image may be fused to obtain a fused image sample. The contour image is fused with each corresponding target background image respectively to obtain a fused image sample.

In the above embodiment, by calculating the distance between the first feature vector corresponding to each contour image and the second feature vector of each second image sample (the larger the distance between the first feature vector and the second feature vector, the larger the difference between the corresponding contour image and the second image sample, that is, the lower the matching degree), the second image sample with the larger distance between the corresponding vectors is screened out, and the remaining second image sample is used as the target background image of the corresponding contour image, so that the target background image corresponding to each contour image can be obtained more accurately.

In a possible implementation, with reference to fig. 5 for the above S401, determining the first feature vector of each contour image and the second feature vector of each second image sample respectively includes:

s501, extracting a first original feature vector of each contour image and a second original feature vector of each second image sample;

s502, performing dimensionality reduction processing on the first original feature vector and the second original feature vector to obtain a first feature vector and a second feature vector which have the same dimensionality.

In the embodiment of the disclosure, a first original feature vector of each contour image and a second original feature vector of each second image sample can be extracted through a feature extraction layer of the trained first neural network model. Or, a first original feature vector of each contour image and a second original feature vector of each second image sample can be extracted through a PCA algorithm, and the dimensions of the extracted first original feature vector and the extracted second original feature vector are reduced to the same dimension, so that the first feature vector and the second feature vector are obtained.

Illustratively, first sample data can be obtained, the first sample data comprises a matched target contour image and a target sample image, and the first neural network model is trained through the first sample data until the accuracy of the trained model meets a threshold; and then, a feature extraction layer of the trained first neural network model can be selected, and a first original feature vector of each contour image and a second original feature vector of each second image sample are extracted based on the selected feature extraction layer. The structure of the first neural network model can be set according to actual needs.

In a possible implementation manner, for the above S302, referring to fig. 6, the fusing the contour image and at least one target background image to obtain a fused image sample includes:

s601, aiming at each target background image, determining a fusion position of the contour image in the target background image based on the coincidence degree of each object detection area in the target background image and the contour image; the object detection area is an image area containing any detection object;

and S602, fusing the target background image and the outline image based on the fusion position to obtain a fusion image sample.

In the embodiment of the present disclosure, each object detection area included in each target background image may be determined according to the trained area detection neural network, or each object detection area included in each target background image may be marked in a manual manner. The object may be a building, a vehicle, an animal, a human, etc., among others. For example, each object detection area in the target background image may be a detection frame, that is, a detection frame of each object may be included in each target background image. The outline image may also correspond to a detection frame.

For example, the fusion position of the contour image in the target background image may be determined according to a degree of coincidence (IOU) between the detection frame of each object in the target background image and the detection frame corresponding to the contour image. Specifically, a threshold may be set, and a position where the IOU between the detection frame of the outline image and the detection frame corresponding to each object is lower than the threshold may be determined as the fusion position. For example, if the target background image a corresponding to the contour image a includes 3 object detection frames (i.e., the first object detection frame, the second object detection frame, and the third object detection frame), a target position may be determined, and if, at the target position, the coincidence degree of the detection frame of the contour image a and the first object detection frame is lower than the threshold, the coincidence degree of the detection frame of the contour image a and the second object detection frame is lower than the threshold, and the coincidence degree of the detection frame of the contour image a and the third object detection frame is lower than the threshold, the target position may be determined as the fusion position. And/or, for example, if the area of the object detection frame is smaller than that of the detection frame of the outline object, at the target position, if the detection frame of the outline object can completely cover the object detection frame, the target position can be regarded as the fusion position. For example, if the target background image B corresponding to the outline image B includes the object detection frame four, and if the detection frame of the outline image B completely covers the object detection frame four at the determined target position, the target position may be determined as the fusion position.

In the method, the situation that the position of the pet to be detected does not accord with the natural law (for example, the pet to be detected appears in the sky or the pet to be detected appears in the lake) in the fused image sample obtained after fusion can be avoided by determining the fusion position, so that the authenticity of the fused image sample after fusion is improved.

In a possible implementation, with reference to fig. 7 for S602, fusing the target background image and the contour image based on the fusion position to obtain a fused image sample includes:

s701, placing the contour image at the fusion position of the target background image to generate an intermediate image sample;

and S702, performing Gaussian blur processing on the edge image of the contour image in the middle image sample to obtain a fused image sample.

In the embodiment of the present disclosure, the contour image may be placed at the fusion position of the target background image, so that the contour image covers the local image corresponding to the fusion position on the target background image, and an intermediate image sample is generated.

In the embodiment of the present disclosure, after the intermediate image sample is generated, the edge image of the contour image in the intermediate image sample may be subjected to gaussian blur processing to obtain a fused image sample. Illustratively, the gaussian blurring processing of the edges of the outline image in the intermediate image sample may be implemented by image processing software, and the image processing software may be adobe photoshop software or the like.

In the method, the reality of the fused image sample is further improved by carrying out the Gaussian blur processing on the edge image of the outline image.

For S203:

in the embodiment of the disclosure, a partial image may be selected from the fused image samples, and the partial image and the first image sample may form a training sample. All the obtained fusion image samples and the first image sample can also form a training sample.

In an alternative embodiment, referring to fig. 8, determining a training sample based on the fused image sample and the first image sample includes:

s801, determining the number of the fused image samples based on the set proportion of the fused image samples to the first image samples and the number of the first image samples.

S802, selecting a corresponding number of fusion image samples from the obtained fusion image samples based on the determined number of the fusion image samples;

and S803, determining training samples based on the selected corresponding number of fused image samples and the first image samples.

In the embodiment of the disclosure, the proportion of the fused image sample to the first image sample can be determined according to actual needs; for example, if the number of required training samples is 500 and the first image sample is 100, then the ratio of the fused image sample to the first image sample can be determined to be 4: 1. The proportion of the fused image sample to the first image sample can be determined through a plurality of tests, specifically, training samples with different proportions are selected, a neural network for image detection is trained, and according to the accuracy of the trained neural network, the proportion of the fused image sample to the first image sample, namely the proportion corresponding to the neural network with the highest accuracy is determined as the proportion of the fused image sample to the first image sample. Further, the number of fused image samples may be determined according to the determined ratio and the number of first image samples, for example, if the determined ratio of the fused image samples to the first image samples is 3:1 and the number of first image samples is 100, the number of fused image samples is 300.

In the embodiment of the present disclosure, based on the determined number of the fused image samples, a corresponding number of fused image samples are selected from the obtained fused image samples. The selected fused image sample and the first image sample form a training sample. And the number of the obtained fusion image samples is greater than or equal to the determined number. That is, if the number of the selected fusion image samples is m based on the set ratio of the fusion image samples to the first image samples and the number of the first image samples, and the number of the fusion image samples obtained after fusion is n, then n is generally equal to or greater than m, and m and n are positive integers.

In the embodiment of the present disclosure, based on the determined number of the fused image samples, a corresponding number of fused image samples may be randomly selected from the fused image samples obtained after the fusion. And/or classifying the fused image samples according to the set scenes (i.e. classifying according to the scenes of the second image samples corresponding to the fused image samples), and selecting a corresponding number of fused image samples from the classified fused image samples. For example, the fused image samples may be divided into an indoor scene and an outdoor scene, and if the number of the determined fused image samples is 200, 100 fused image samples may be selected from the fused image samples corresponding to the indoor scene, and 100 fused image samples may be selected from the fused image samples corresponding to the outdoor scene, so as to obtain the selected fused image samples. Or, the fusion image samples may be classified according to the set scenes, and the fusion image samples of the corresponding number are selected from the classified fusion image samples according to the ratio of the number of the fusion image samples corresponding to different scenes. For example, the fused image samples may be divided into an indoor scene and an outdoor scene, the number of the fused image samples corresponding to the indoor scene is 100, the number of the fused image samples corresponding to the outdoor scene is 300, that is, the ratio between the number of the fused image samples corresponding to the indoor scene and the number of the fused image samples corresponding to the outdoor scene is 1:3, if the determined number of the fused image samples is 200, 50 fused image samples may be selected from the fused image samples corresponding to the indoor scene, and 150 fused image samples may be selected from the fused image samples corresponding to the outdoor scene, so as to obtain 200 selected fused image samples.

In the above embodiment, the fusion image sample and the first image sample are configured quantitatively according to the set proportion, so that the first image sample is ensured to occupy a certain proportion while the total number of the training samples is increased, and the reliability of the training sample set can be improved to a certain extent.

After the training samples are obtained, the training samples can be input into a neural network to train the neural network. The specific structure of the neural network for image detection may be determined according to an actual application scenario, and specifically, the structure of the neural network may be determined in a manner of multiple tests, for example, the number of convolution kernels in convolution layers included in the neural network may be determined in a manner of multiple tests.

In the embodiment of the disclosure, after the neural network is trained, the trained neural network can be used for detecting the pet to be detected, and the pet to be detected is analyzed based on the detection structure. For example, if the pet to be detected is a pet such as a cat or a dog, the state of the pet may be determined according to the detection result, for example, it is determined that the pet is in a sleeping state, a eating state, a playing state, or the like. Or, if there are a plurality of pets to be detected, that is, if the pets to be detected include a cat and a dog, the states of the cat and the dog may be determined according to the detection result, for example, if the distance between the cat and the dog is smaller than the set threshold, the cat and the dog are considered to be in a shelving state, and the like.

In an alternative embodiment, training a neural network for image detection based on training samples includes:

carrying out data set enhancement processing on the training samples;

and training a neural network for image detection based on the training sample after the data set enhancement processing.

In embodiments of the present disclosure, the data set enhancement processing may include at least one of the following transformations: rotation, flipping, scaling, translation, scale change, contrast transformation, color transformation, and the like. Specifically, the training samples may be subjected to data set enhancement processing, and the training samples subjected to the data set enhancement processing may be used to train a neural network for image detection.

According to the method, the training sample is subjected to data set enhancement processing, and after the neural network is trained on the basis of the training sample subjected to data set enhancement processing, the trained neural network can detect various images to be detected, so that the anti-interference capability of the trained neural network is enhanced.

In an alternative embodiment, the method further comprises:

acquiring a to-be-detected video containing a to-be-detected pet;

detecting the position information of the pet to be detected in each frame of image in the video to be detected based on the trained neural network;

and generating a moving track of the pet to be detected based on the position information of the pet to be detected in each frame of image in the video to be detected.

The embodiment of the disclosure can also receive the video to be detected, determine each frame of image in the video to be detected, and detect the image in each frame in the neural network trained by the input value of each frame of image to obtain the position information of the pet to be detected in each frame of image. And generating a moving track of the pet to be detected according to the position information of the pet to be detected in each frame of image in the video to be detected.

In the embodiment, each frame of image is detected through the trained neural network to obtain the position information of the pet to be detected, and then the movement track of the pet to be detected is obtained, so that the user can obtain the behavior information of the pet to be detected according to the obtained movement track.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Based on the same concept, an embodiment of the present disclosure further provides a pet detection apparatus, as shown in fig. 9, which is a schematic diagram of an architecture of a pet detection method apparatus provided in an embodiment of the present disclosure, and includes an obtaining module 901, a determining module 902, a first prompting module 903, a second prompting module 904, and a training sample determining module 905, specifically:

an obtaining module 901, configured to obtain an image in a cabin;

a determining module 902, configured to detect the image and determine whether a pet exists in the cabin;

the first prompt module 903 is configured to send a first prompt message when a pet is present in the vehicle cabin and the duration of the state in which no adult exists exceeds a first preset duration.

and a second prompt module 904, configured to send a second prompt message when it is determined that a pet is in the vehicle cabin and the duration of the state that the pet is not in the rear seat in the vehicle cabin exceeds a second preset duration.

In one possible embodiment, the detecting the image to determine whether the pet exists in the cabin is performed by a neural network, the neural network is trained by using a training sample, and the apparatus further includes a training sample determining module 905 configured to obtain the training sample by:

generating a fused image sample based on the first image sample and the second image sample; the fusion image sample comprises the pet to be detected and an image in the second image sample as a background image;

In one possible embodiment, the training sample determining module 905, when generating a fused image sample based on the first image sample and the second image sample, is configured to:

In a possible implementation manner, the training sample determining module 905, when fusing the second image sample with the contour image to obtain the fused image sample, is further configured to:

In a possible implementation, the training sample determining module 905, when selecting, for each of the contour images, at least one target background image matching the contour image from the second image sample, is further configured to:

In a possible implementation manner, the training sample determining module 905, when fusing the contour image and the at least one target background image to obtain the fused image sample, is configured to:

In a possible implementation manner, the training sample determining module 905, when the target background image and the contour image are fused based on the fusion position to obtain the fusion image sample, is configured to:

In a possible implementation, the training sample determining module 905, when determining a training sample based on the fused image sample and the first image sample, is configured to:

In some embodiments, the functions of the apparatus provided in the embodiments of the present disclosure or the included templates may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, no further description is provided here.

Based on the same technical concept, the embodiment of the disclosure also provides an electronic device. Referring to fig. 10, a schematic structural diagram of an electronic device provided in the embodiment of the present disclosure includes a processor 1001, a memory 1002, and a bus 1003. The memory 1002 is used for storing execution instructions, and includes a memory 10021 and an external memory 10022; the memory 10021 is also referred to as a memory, and is used for temporarily storing operation data in the processor 1001 and data exchanged with the external memory 10022 such as a hard disk, the processor 1001 exchanges data with the external memory 10022 through the memory 10021, and when the electronic device 1000 operates, the processor 1001 and the memory 1002 communicate with each other through the bus 1003, so that the processor 1001 executes the following instructions:

acquiring an image in a vehicle cabin;

detecting the image to determine whether a pet exists in the cabin;

In addition, the embodiment of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the pet detection method described in the above method embodiment are performed.

The computer program product of the pet detection method provided in the embodiments of the present disclosure includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the steps of the pet detection method described in the above method embodiments, which may be referred to in the above method embodiments specifically, and are not described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above are only specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present disclosure, and shall be covered by the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A pet detection method, comprising:

acquiring an image in a vehicle cabin;

detecting the image to determine whether a pet exists in the cabin;

2. The method of claim 1, wherein the in-vehicle image is acquired while the vehicle is in motion, the method further comprising:

3. The method of claim 1 or 2, wherein the detecting the image to determine whether the pet is present in the cabin is performed by a neural network, the neural network being trained using training samples obtained by:

4. The method of claim 3, wherein generating a fused image sample based on the first image sample and the second image sample comprises:

extracting a contour image of the pet to be detected from the first image sample;

5. The method of claim 4, wherein said fusing the second image sample with the contour image to obtain the fused image sample comprises:

6. The method of claim 5, wherein for each of the contour images, selecting at least one target background image from the second image sample that matches the contour image comprises:

7. The method of claim 6, wherein separately determining a first feature vector for each of the contour images and a second feature vector for each of the second image samples comprises:

8. The method of claim 5, wherein fusing the outline image with the at least one target background image to obtain the fused image sample comprises:

9. The method according to claim 8, wherein said fusing the target background image and the contour image based on the fusion position to obtain the fused image sample comprises:

10. The method of claim 3, wherein determining a training sample based on the fused image sample and the first image sample comprises:

11. A pet detection device, comprising:

the acquisition module is used for acquiring images in the vehicle cabin;

12. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the pet detection method of any one of claims 1 to 10.

13. A computer-readable storage medium, having stored thereon a computer program for performing, when executed by a processor, the steps of the pet detection method according to any one of claims 1 to 10.