CN112069901B

CN112069901B - In-vehicle article monitoring method, electronic device, and storage medium

Info

Publication number: CN112069901B
Application number: CN202010782915.8A
Authority: CN
Inventors: 王小刚; 余程鹏; 左凯
Original assignee: Nanjing Leading Technology Co Ltd
Current assignee: Nanjing Leading Technology Co Ltd
Priority date: 2020-08-06
Filing date: 2020-08-06
Publication date: 2022-07-08
Anticipated expiration: 2040-08-06
Also published as: CN112069901A

Abstract

The invention relates to a method for monitoring articles in a vehicle, electronic equipment and a storage medium, relating to the technical field of Internet of things and aiming at solving the problem that the influence range is larger when upgraded software runs abnormally when a comprehensive upgrading mode is adopted in software upgrading, and the method comprises the following steps: acquiring a first in-vehicle image of a target object before getting on the vehicle, a second in-vehicle image of the target object after leaving the vehicle and at least one third in-vehicle image of the target object in the riding process; searching a first area with the similarity smaller than a first preset value with the first in-vehicle image from the second in-vehicle image; extracting a second area having the same image position as the first area from the third in-vehicle image; and if the similarity between the first area and the second area is greater than a second preset value, determining that the object of the target object is left in the vehicle. According to the embodiment of the invention, the possible articles left in the image are identified, and the possible articles left and the scene of the articles on the vehicle are verified again, so that the accuracy is improved.

Description

In-vehicle article monitoring method, electronic device, and storage medium

Technical Field

The invention relates to the technical field of Internet of things, in particular to a method for monitoring articles in a vehicle, electronic equipment and a storage medium.

Background

The taxi is still an important choice for people to go on a journey, and when the taxi is taken, the taxi often carries luggage, and particularly when the taxi goes out of a door, the luggage carried by the taxi is more, so that people often leave the taxi, and property loss is caused to people.

At present, when the luggage is missed on the taxi, the driver usually finds the luggage, or the passenger on the next list finds the luggage, and the passenger on the previous list needs to take more time to retrieve the luggage. Or a gravity sensor is installed under a seat of the vehicle, so that whether articles are lost or not is determined according to the change of the weight of passengers before and after getting on and off the vehicle. However, the timeliness of luggage is found to be poor by human beings. Or a gravity sensor, when the weight of the luggage is not large, the luggage is likely to be unrecognizable, and therefore, the accuracy is not high.

Disclosure of Invention

The invention provides an in-vehicle article monitoring method, electronic equipment and a storage medium, which are used for identifying articles possibly left in an image through a real-time shot image and further verifying the articles possibly left with scenes of the articles when the articles are on a vehicle, so that the timeliness and the accuracy are improved.

In a first aspect, an in-vehicle article monitoring method provided in an embodiment of the present invention includes:

acquiring a first in-vehicle image of a target object before getting on the vehicle, a second in-vehicle image of the target object after leaving the vehicle and at least one third in-vehicle image of the target object in the process of taking the vehicle;

searching a first area with the similarity smaller than a first preset value with the first in-vehicle image from the second in-vehicle image;

extracting a second area having the same image position as the first area from a third in-vehicle image;

and if the similarity between the first area and the second area is greater than a second preset value, determining that the object of the target object is left in the vehicle, and informing the target object according to a preset alarm mode.

According to the method, the two images in the vehicle before and after the target object gets on the vehicle are compared, the area with small similarity of the two images is found out, the area is considered to be an article possibly left in the vehicle by the target object, then the similarity of the area and the same area of the target object in the vehicle is determined, if the similarity is large, the article which is actually the target object in the area is left in the vehicle, and therefore the target object is reminded, and timeliness and accuracy are improved.

In a possible implementation manner, the searching for the first area with the similarity to the first in-vehicle image smaller than the first preset value from the second in-vehicle image includes:

extracting a first image feature vector from the first in-vehicle image and a second image feature vector from the second in-vehicle image through a feature extraction network;

determining similarity of feature elements at the same position in the first image feature vector and the second image feature vector;

and determining the position information of the second in-vehicle image corresponding to the feature element with the similarity smaller than the first preset value, and determining the first area according to the position information.

According to the method, two image feature vectors can be respectively extracted from two in-vehicle images before and after a target object gets on and off a vehicle through a feature extraction network, the similarity of the same positions of the two image feature vectors is determined, the feature elements with smaller similarity correspond to the position information of the image after the target object gets off the vehicle, and the position information is used as a first area.

In one possible implementation manner, the determining the first area according to the location information includes:

and if the position information corresponding to the plurality of characteristic elements in the second in-vehicle image has an overlapping region, taking a region formed by the position information corresponding to the plurality of characteristic elements as a first region.

According to the method, the overlapped areas are recombined, so that the same article can be prevented from being repeatedly subjected to the next step, and the processing speed is improved.

In a possible implementation manner, the training process of the feature extraction network includes:

taking a sample image as input, taking an image feature vector in the sample image as output, and performing multi-round training on a basic neural network to obtain the feature extraction network;

and in each round of training process, adjusting parameters in the basic neural network through the N image feature vectors every time N image feature vectors are output.

According to the method, the basic neural network can be trained for multiple times through the sample image, and the accuracy of extracting the features of the basic neural network is improved.

In one possible implementation, adjusting parameters in the basic neural network by the N image feature vectors includes:

if the sample images corresponding to the N image feature vectors are the same type of images, inputting the N image feature vectors into a first loss function to obtain a similarity value, and adjusting parameters in the basic neural network by using the similarity value; the images of the same type are images with the same scene in the vehicle; wherein the first loss function is a function for determining the similarity degree of the sample images; or

If the sample images corresponding to the N image feature vectors are different types of images, inputting the N image feature vectors into a second loss function to obtain a difference value, and adjusting parameters in the basic neural network by using the difference value; the second loss function is a function that determines a degree of difference in the sample images.

According to the method, because similar features need to be enhanced for the same type of images, loss calculation is performed by adopting the function for determining the similarity degree of the sample images, so that parameters in the basic neural network are adjusted, and different types of images need to be enhanced for different features, loss calculation is performed by adopting the function for determining the difference degree of the sample images, so that parameters in the basic neural network are adjusted.

In one possible implementation, the similarity between the first region and the second region is determined by:

determining a contour region of the article in the first region and the second region, respectively;

determining differences between the gray values of the pixel points at the same position in the contour region, and determining a first number of differences smaller than a first preset difference in all the differences; and

determining the target chromaticity values of the same type in the pixel points at the same positions in the first region and the second region to be differenced, and determining the first number of difference values smaller than a second preset difference value in all the difference values obtained by differencing; the target chromatic value is a partial or all chromatic value of the pixel point;

and taking the value obtained by weighting and summing the first number and the second number as the similarity between the first area and the second area.

According to the method, the similarity of the articles is comprehensively determined through the two aspects of the outline and the chromaticity, and the accuracy is improved.

In one possible implementation, the method further includes:

the method comprises the steps of obtaining a first video of the internal space of a trunk from the opening to the closing of the trunk when a target object reaches a vehicle, and obtaining a second video of the internal space of the trunk from the opening to the closing of the trunk after the target object leaves the vehicle;

determining the number of times of a series of actions of a hand of the target object from entering a trunk to exiting the trunk through the first video; determining the number of times of a series of actions from entering a trunk to exiting the trunk of the hand of the target object through the second video;

determining that the object of the target object is left at the trunk of the vehicle if the number of times determined by the first video is greater than the number of times determined by the second video.

According to the method, the number of the articles is determined by detecting the number of times of series of actions from entering the trunk to exiting the trunk of the hand of the target object, so that when the number of the articles put in is larger than that of the articles taken out, the articles are inevitably left in the trunk of the vehicle, and the invention monitors the articles of the target object more comprehensively.

In a second aspect, an embodiment of the present invention provides an electronic device, including: a memory and a processor:

the memory is used for storing program codes used when the photographing device runs;

the processor is configured to execute the program code to implement the following processes:

In one possible implementation, the processor is further configured to: the method comprises the steps of obtaining a first video of the internal space of a trunk from the opening to the closing of the trunk when a target object reaches a vehicle, and obtaining a second video of the internal space of the trunk from the opening to the closing of the trunk after the target object leaves the vehicle;

determining the number of times of a series of actions of a hand of the target object from entering a trunk to exiting the trunk through the first video; determining the number of times of a series of actions of entering a trunk to exiting the trunk of the hand of the target object through the second video;

determining that the object of the target object is left at the trunk of the vehicle if the number of times determined by the first video is not identical to the number of times determined by the second video.

In a third aspect, the present application further provides a storage medium, where instructions executed by a processor of an electronic device enable the electronic device to perform the in-vehicle article monitoring method according to any one of the embodiments of the first aspect.

In addition, for technical effects brought by any one implementation manner of the second aspect to the third aspect, reference may be made to technical effects brought by different implementation manners of the first aspect, and details are not described here.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention and are not to be construed as limiting the invention.

FIG. 1 is a schematic diagram of an operation process for monitoring an object in a vehicle according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of another exemplary embodiment of the present invention for monitoring the contents of a vehicle;

FIG. 3 is a schematic diagram of a trunk monitoring operation according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of another operational process for monitoring trunk items according to an embodiment of the present invention;

FIG. 5 is a flowchart of a method for monitoring an object in a vehicle according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an embodiment of the present invention for capturing images of a passenger before the passenger gets on the vehicle;

FIG. 7 is a schematic diagram of an in-vehicle image collected during a passenger riding process according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of an embodiment of the present invention for collecting images of a passenger in a vehicle after leaving the vehicle;

FIG. 9 is a schematic illustration of a vehicle with two articles left behind according to an embodiment of the present invention;

FIG. 10 is a schematic illustration of another vehicle interior two-piece object left therein according to an embodiment of the present invention;

fig. 11 is a block diagram of an electronic device according to an embodiment of the present invention;

fig. 12 is a block diagram of another electronic device according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," "third," and the like in the description and in the claims, and in the drawings, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Some of the words that appear in the text are explained below:

1. the term "electronic device" in the embodiments of the present invention refers to any intelligent electronic device capable of operating according to a program and automatically processing a large amount of data at a high speed, and includes a vehicle-mounted terminal, a mobile phone, a computer, a tablet, an intelligent terminal, a multimedia device, a streaming media device, and the like.

The application scenario described in the embodiment of the present invention is for more clearly illustrating the technical solution of the embodiment of the present invention, and does not form a limitation on the technical solution provided in the embodiment of the present invention, and it can be known by a person skilled in the art that with the occurrence of a new application scenario, the technical solution provided in the embodiment of the present invention is also applicable to similar technical problems. Wherein, in the description of the present invention, unless otherwise indicated, "a plurality" means.

The timeliness of the luggage is found to be poor by people; the accuracy is not high when the gravity sensor is used.

In order to solve the above problem, the embodiment of the present invention monitors two areas in the vehicle where articles are placed, where one area is an in-vehicle scene and the other area is a trunk of the vehicle.

When the method provided by the embodiment of the invention is implemented, a camera is required to be installed in the vehicle for shooting the scene in the vehicle, and a camera is installed at the trunk of the vehicle for shooting the scene in the trunk.

When monitoring the in-vehicle articles, a camera arranged in the vehicle is used for shooting an in-vehicle scene SD1 of a target object which is not on the vehicle, at least two in-vehicle scenes SD2 and SD3 of the target object in the process of taking a vehicle, and an in-vehicle scene SD4 of the target object after leaving the vehicle.

Referring to fig. 1, the image of the scene In the Vehicle is collected by a camera and an IVI (In-Vehicle Infotainment system). The method comprises the following specific steps: taking the internet-of-Vehicle article monitoring as an example, a driver downloads internet-of-Vehicle (internet-of-Vehicle) reservation software In an internet-of-Vehicle (IVI) of a Vehicle, receives an order of passengers (target objects), controls a camera outside the Vehicle to be turned on when the software prompts the Vehicle to reach a starting point appointed by the passengers, controls the camera outside the Vehicle to shoot images, controls a camera installed inside the Vehicle to shoot an In-Vehicle image SD1 on which the passengers do not get on the Vehicle when the passengers are detected to approach the Vehicle In images shot by the camera outside the Vehicle, determines that the passengers are In the Vehicle after detecting that a door of a non-driving door of the Vehicle is turned off from the opened state, controls a camera installed inside the Vehicle to shoot an In-Vehicle image SD2 on which the passengers just get on the Vehicle, and controls the camera installed inside the Vehicle to shoot an In-Vehicle image SD3 before the passengers get off the Vehicle when an end order of the internet-of-Vehicle reservation software is detected, after detecting that the door of the non-driving door of the vehicle is opened to be closed, the camera arranged in the vehicle is controlled to shoot an in-vehicle image SD4 after the passenger leaves the vehicle.

Then, in the manner provided by the embodiment of the present invention, with reference to fig. 2, the similarity between SD4 and SD1 is determined, an area R1 with a smaller similarity than SD1 in SD4 is found, an area with the same image position as the area R1 is searched in SD2, and whether the areas are similar or not is determined. Searching an area with the same position as the image of the area R1 in the SD3, judging whether the areas are similar, if one of the areas is similar, determining that the article is lost, and if not, determining that the article is not lost.

If the article is determined to be lost, the warning information is sent to a mobile phone of the passenger to inform the passenger of the article loss, or the warning information is displayed through a display and/or a loudspeaker of the IVI to inform the driver, so that the driver can inform the passenger.

When monitoring is carried out on articles at the trunk of a vehicle, a camera installed at the trunk of the vehicle shoots a target object and opens the target object to the closing process from the trunk when the target object reaches the vehicle, a first video of the internal space of the trunk shoots a second video of the internal space of the trunk after the target object leaves the vehicle and opens the target object to the closing process from the trunk.

As shown in fig. 3, images of the interior space of the trunk are captured by the camera and IVI. For example, when the software prompts that the vehicle reaches a starting point appointed by a passenger, a camera outside the vehicle is controlled to be started, the camera outside the vehicle is used for shooting, when the fact that the passenger holds a large article is detected in an image shot by the camera outside the vehicle, a trunk of the vehicle is controlled to be started, a camera arranged at the trunk of the vehicle is controlled to be started, and a first video of the inner space of the trunk is shot in the process from the opening to the closing of the trunk when a target object reaches the vehicle. When an order of the network appointment software in the IVI is detected, and after the fact that the door of the non-driving door of the vehicle is opened to be closed is detected, the trunk of the vehicle is controlled to be opened, a camera is installed at the trunk of the vehicle to be opened, and a second video of the internal space of the trunk is shot in the process that the target object is opened to be closed from the trunk after leaving the vehicle.

Referring to fig. 4, the number of times of the series of actions of entering the trunk and exiting the trunk of the hand of the target object, i.e., the number of items N1 is determined by the first video; determining the number of times of series of actions from entering the trunk to exiting the trunk of the hand of the target object, namely the number of the articles N2 through the second video; if the number of times determined by the first video is greater than the number of times determined by the second video, i.e., N1> N2, it is determined that the item of the target object is left at the trunk of the vehicle.

If the article is determined to be lost, the alarm information is sent to a mobile phone of the passenger to inform the passenger of the article loss, or the alarm information is displayed through a display and/or a loudspeaker of the IVI to inform the driver, so that the driver can inform the passenger.

The series of actions of the hand of the target object from entering the trunk to exiting the trunk can be determined by the following method:

the method comprises the steps of sequentially inputting video frames in a video into a recognition network, wherein the recognition process of the recognition network is to recognize a hand of a target object in an image, then according to a time sequence, whether series of actions of the hand enter and exit from the interior of a trunk or not is recorded once, then according to the time sequence, images which are not recognized are continuously recognized, whether series of actions of the hand enter and exit from the interior of the trunk or not is determined, and the time until all video frames in a first video are recognized is up. In the same way, the series of motions of the hand in the second video can be recognized, and the number of times of entering the hand to exiting the hand is determined.

The details are described in conjunction with the following figures:

referring to fig. 5, an embodiment of the present invention provides a method for monitoring an in-vehicle object, including:

s500: the method comprises the steps of obtaining a first in-vehicle image of a target object before getting on the vehicle, a second in-vehicle image of the target object after leaving the vehicle, and at least one third in-vehicle image of the target object in the process of taking the vehicle.

S501: and searching a first area with the similarity smaller than a first preset value with the first in-vehicle image from the second in-vehicle image.

S502: a second area having the same image position as the first area is extracted from the third in-vehicle image.

S503: and if the similarity between the first area and the second area is greater than a second preset value, determining that the object of the target object is left in the vehicle, and informing the target object according to a preset alarm mode.

By the method, the images of the target object before and after the target object leaves the vehicle and the images in the process of taking the vehicle can be obtained, the area which is possibly the object is obtained by comparing the images before and after the target object leaves the vehicle, the area at the same position of the images in the process of taking the vehicle is compared, whether the area is similar to the area is judged, if the similarity is high, the area is the object lost by the target object, a mode of automatically monitoring the object in the vehicle is provided, meanwhile, whether the object is lost by the target object is determined by two times of comparison, the monitoring accuracy is improved, and the timeliness is improved.

In this case, since the image recognition may be affected by environmental changes during the riding of the target object, in order to improve the accuracy, the third in-vehicle image is selected and the in-vehicle image immediately before the target object leaves the vehicle is used. For example, a network appointment car may be used as an in-car image when the target object is not alighted after the order is completed.

As an example, in the field of network appointment, the target object is a passenger of the network appointment, and as shown in fig. 6, the passenger of the network appointment is not in front of the vehicle, and an in-vehicle image SD1 is obtained; in the process of taking a passenger in a network appointment, the image of the passenger not leaving the vehicle after reaching the destination, the image of the passenger just getting on the vehicle as SD2, and the image of the passenger not leaving the vehicle as SD3 are shown in fig. 7. When the vehicle reaches the destination, the passenger of the net appointment gets off the vehicle, and the in-vehicle image SD4 at this moment is acquired, as shown in fig. 8.

In order to see whether or not the passenger of the networked car appointment has any articles left in the car, it can be determined that the similarity between the region R1 in fig. 8 and that in fig. 5 is relatively small by comparing the in-car image SD1 obtained in fig. 6 with the in-car image SD4 obtained in fig. 3. Then, the region R2 at the same position as the region R1 in fig. 8 is obtained from the in-vehicle image SD3, and the region R1 in the in-vehicle image SD4 in fig. 8 is compared with the region R2 in the in-vehicle image SD3 in fig. 7, so that since the region R1 in the in-vehicle image SD4 in fig. 8 and the region R2 in the in-vehicle image SD3 in fig. 7 are both the same item, i.e., the soccer ball, it is determined that the soccer ball of the passenger of the net appointment remains in the vehicle, and the passenger of the net appointment is warned by the in-vehicle image SD3 and the mobile phone of the passenger of the net appointment and the driver.

Since the environment of the vehicle changes with time before the vehicle reaches the destination, when comparing the region R1 with the image during riding in fig. 8, a bidirectional comparison mode may be adopted, that is, a plurality of images of the passenger during riding may be collected and compared with the region R1, and if an image with high similarity to the region at the same position as the region R1 is included, the network is warned to lead the passenger to be contracted.

The specific way of searching the first area from the second in-vehicle image is as follows:

extracting a first image feature vector from a first in-vehicle image and extracting a second image feature vector from a second in-vehicle image through a feature extraction network;

determining the similarity of the feature elements at the same position in the first image feature vector and the second image feature vector;

and determining the position information of the image in the second vehicle corresponding to the characteristic elements with the similarity smaller than the first preset value, and determining the first area according to the position information.

When the vehicle interior image processing method works, a first vehicle interior image and a second vehicle interior image can be input into a feature extraction network, the feature extraction network performs feature extraction on the first vehicle interior image to obtain an image feature vector, FeatX is adopted for representation, each feature element in the vector is obtained from the feature of pixel points of partial areas in the first vehicle interior image, and therefore each feature element in the vector corresponds to the position of the area in the first vehicle interior image one to one; similarly, the feature extraction network performs feature extraction on the second in-vehicle image to obtain an image feature vector, and the image feature vector is expressed by FeatY, wherein each feature element in the vector is obtained from the feature of the pixel point in the partial area in the second in-vehicle image.

And (3) obtaining the similarity of the feature elements at the same position by taking difference of the feature elements at the same position in FeatX and FeatY, and specifically adopting the following formula:

Dist(i,j)＝||FeatX(i,j)-FeatY(i,j)||

where i denotes the number of rows in the vector, j denotes the number of columns in the vector, FeatX (i, j) is a feature element of row i and column j in FeatX, FeatY (i, j) is a feature element of row i and column j in FeatY, Dist (i, j) is a difference between two feature elements, i.e., a similarity, of a feature element of row i and column j in FeatX, and a feature element of row i and column j in FeatY.

If the difference value obtained by the difference of the characteristic element is smaller than the first preset value, mapping the characteristic element to the position of the second in-vehicle image to obtain position information;

if the difference obtained by the difference of the characteristic element is not smaller than the first preset value, the position of the characteristic element mapped to the second in-vehicle image is not provided with articles, and the position is not processed.

And forming the position information of the second in-vehicle image corresponding to all the characteristic elements smaller than the first preset value into a first area.

If the position information corresponding to the plurality of characteristic elements in the second in-vehicle image does not have the overlapped area, the position information of the second in-vehicle image corresponding to all the characteristic elements smaller than the first preset value is respectively used as the first area for the next processing.

For example, as shown in fig. 9, when the feature elements smaller than the first preset value correspond to the position information of the second in-vehicle image, i.e., the position information R3, the position information R4, the upper-left corner region and the lower-right corner region are not overlapped, the feature elements are respectively processed in the next step.

If the position information corresponding to the plurality of feature elements in the second in-vehicle image has an overlapping area, an area formed by the position information corresponding to the plurality of feature elements is set as the first area.

For example, as shown in fig. 10, including the position information R5 (indicated by a broken line) and the position information R6 (indicated by a solid line), when the position information R5 and the position information R6 include an overlapping region, a region formed by the position information R5 and the position information R6 in the second in-vehicle image is taken as the first region.

By the mode, the same article can be prevented from being repeatedly detected in the next step, and the processing speed is increased.

For the feature extraction network:

first, the structure of the feature extraction network is introduced, and the network structure is composed of a convolutional neural network (Conv), a Batch Normalization (BN), a linear rectification function (relu), and a normalization (firing).

And then training the network structure to obtain a feature extraction network.

Taking a sample image as input, taking an image feature vector in the sample image as output, and performing multi-round training on a basic neural network to obtain a feature extraction network;

and in each round of training process, adjusting parameters in the basic neural network through the N image characteristic vectors every time N image characteristic vectors are output.

If the sample images corresponding to the N image feature vectors are the same type of images, inputting the N image feature vectors into a first loss function to obtain similarity values, and adjusting parameters in a basic neural network by using the similarity values; the images of the same type are images with the same scene in the vehicle; wherein the first loss function is a function for determining the similarity degree of the sample images; or

If the sample images corresponding to the N image feature vectors are different types of images, inputting the N image feature vectors into a second loss function to obtain a difference value, and adjusting parameters in the basic neural network by using the difference value; the second loss function is a function that determines the degree of difference of the sample images.

Wherein, the images with the same scene in the vehicle, namely, the images shot by no passenger in the vehicle; the scene is the same image in the car, namely there is no passenger in the car and there is passenger in the car, there is no passenger in the car and there is article in the car.

The specific loss function form is as follows:

Loss＝(1-label)*L_n(FeatX,FeatY)+label*L_p(FeatX,FeatY)

Dist(i,j)＝||FeatX(i,j)-FeatY(i,j)||

when label is 0, representing the same type of images, the first loss function is Ln, and the first half part calculates the sum of squares of differences of feature elements at the same positions of FeatX and FeatY; the second half, starting from the whole, calculates the cosine similarity of the FeatX and FeatY vectors, and determines the angle between the two vectors, i.e. the position relation in space. Wherein the parameter α is used as a weight.

When label is 1, the images represent different types of images, the second loss function Lp is divided into two parts, wherein the size of gamma is consistent with that of FeatX, the values of the pixels in the images are 0 and 1, the part with the value of 1 in gamma corresponds to the marked data in the frame in the original image, namely the part with difference in FeatY and the part with articles, the other part is 0, the first half part 1-gamma (i, j) of the second loss function Lp represents the part without difference in the two images, and the square sum of the difference values of the characteristic elements at the same position is calculated; the second half of the second loss function Lp represents the difference between the two images, and S is chosen_ΓSubtracting the sum of the absolute values of the difference parts; wherein S_ΓThe number of statistical pixel values in Γ is 1.

In the actual working process, the feature extraction network extracts two images respectively, so that in each training process, two images can be simultaneously input into the basic neural network for training.

If the two in-vehicle images are the same type of image, the two in-vehicle images can be input into the basic neural network, then the image feature vector output by the basic neural network is input into the first loss function Ln to obtain a similarity value, and the parameters of the basic neural network are adjusted through the similarity value.

If the two in-vehicle images are different types of images, the two in-vehicle images can be input into the basic neural network, then the image feature vectors output by the basic neural network are input into the second loss function Lp to obtain a difference value, and parameters of the basic neural network are adjusted through the difference value.

In an embodiment of the invention, the similarity between the first region and the second region is determined by:

determining the outline areas of the articles in the first area and the second area respectively;

determining target chromaticity values of the same type in pixel points at the same positions in the first region and the second region to be differenced, and determining a first number of difference values smaller than a second preset difference value in all difference values obtained by differencing; wherein, the target chromatic value is a partial or total chromatic value of the pixel point;

In detail, the following formula can be adopted:

wherein TH is_cRepresenting the upper threshold of the c-th channel, R2_c(i, j) is the chroma value of the pixel point at image position (i, j) on the c-th channel in the second region, R1_c(i, j) is the chroma value, Color, of the pixel point at the image position (i, j) on the c-th channel in the first region_c(i, j) represents the chrominance values and TH of two pixel points at image position (i, j) on the c-TH channel_cComparing, when the chroma values of two pixel points at the image position (i, j) on the c-TH channel are not more than TH_cThen determine Color_c(i, j) is 255, i.e. bright spot, when the image position on the c-th channelThe chrominance values of two pixel points on (i, j) are greater than TH_cThen determine Color_c(i, j) is 0, i.e., a dark spot.

And establishing a coordinate system in the images corresponding to the first area and the second area, and expressing the image position of the pixel point by adopting a coordinate position (i, j) on the coordinate system.

Sobel(R2_c(i, j)) is the gray value of the pixel point at image position (i, j) on the outline region in the second region, Sobel (R1)_c(i, j)) is the gray value of the pixel point at the image position (i, j) on the Contour region in the first region, and Contour (i, j) is the result of comparing the gray values of the two pixel points at the image position (i, j) with the first preset difference value. When the gray values of the two pixel points at the image position (i, j) are not greater than a first preset difference value, determining that the content (i, j) is 255, namely a bright point, and when the gray values of the two pixel points at the image position (i, j) are greater than the first preset difference value, determining that the content (i, j) is 0, namely a dark point.

The gray value is obtained by binarizing pixels of pixel points in the first area and the second area, and the outline areas of the articles in the first area and the second area are determined to be obtained in a sobel operator mode.

Region (i, j) is a value obtained by weighted summation of the number of bright points obtained by color and the number of bright points obtained by contour, and β is a weight value.

And comparing the obtained Region value with a second preset value, wherein if the Region value is greater than the second preset value, the times representing the bright points are more, which indicates that the similarity of the two regions is higher, and if the Region value is not greater than the second preset value, the times representing the bright points are less, which indicates that the similarity of the two regions is lower.

In summary, an embodiment of the present invention further provides an in-vehicle article monitoring device, including: the system comprises an in-vehicle sensing module, an out-vehicle sensing module and an alarm module, wherein the alarm module is respectively connected with the in-vehicle sensing module and the out-vehicle sensing module;

the in-vehicle sensing module comprises a data acquisition sub-module, a region similarity detection sub-module, a matching sub-module and an alarm module;

the data acquisition sub-module is used for acquiring a first in-vehicle image of a target object before getting on the vehicle, a second in-vehicle image of the target object after leaving the vehicle and at least one third in-vehicle image of the target object in the riding process;

the area similarity detection submodule is used for extracting a first image feature vector from the first in-vehicle image and a second image feature vector from the second in-vehicle image through a feature extraction network, determining the similarity of feature elements at the same positions in the first image feature vector and the second image feature vector, determining the position information of the second in-vehicle image corresponding to the feature elements with the similarity smaller than a first preset value, and determining a first area according to the position information;

the matching sub-module is used for extracting a second area with the same image position as the first area from a third in-vehicle image and respectively determining the outline areas of the articles in the first area and the second area; determining differences between the gray values of the pixel points at the same position in the contour region, and determining a first number of differences smaller than a first preset difference in all the differences; determining target chromaticity values of the same type in pixel points at the same positions in the first region and the second region to be differenced, and determining a first number of difference values smaller than a second preset difference value in all difference values obtained by differencing; wherein, the target chromatic value is a partial or total chromatic value of the pixel point; and taking a value obtained by weighting and summing the first quantity and the second quantity as the similarity between the first area and the second area, and if the similarity is greater than a second preset value, determining that the object of the target object is left in the vehicle.

And the alarm module is used for informing the target object according to a preset alarm mode.

The system comprises an external sensing module, a trunk monitoring module and a control module, wherein the external sensing module is used for acquiring a first video of the internal space of a trunk from the trunk in the process of opening to closing when a target object reaches a vehicle, acquiring a second video of the internal space of the trunk from the trunk in the process of opening to closing after the target object leaves the vehicle, and determining the frequency of series actions from entering the trunk to exiting the trunk of the hand of the target object through the first video; and determining the frequency of the series of actions of the hand of the target object from entering the trunk to exiting the trunk through the second video, and determining that the object of the target object is left at the trunk of the vehicle if the frequency determined through the first video is not consistent with the frequency determined through the second video.

As shown in fig. 11, the electronic device 1100 according to another embodiment of the present invention includes: a memory 1120 and a processor 1110;

the memory 1120 is used for storing program codes used when the photographing device runs;

the processor 1110 is configured to execute the program code to implement the following processes:

Optionally, the processor 1110 is specifically configured to: extracting a first image feature vector from the first in-vehicle image and a second image feature vector from the second in-vehicle image through a feature extraction network;

Optionally, the processor 1110 is specifically configured to:

Optionally, the processor 1110 is specifically configured to: determining a contour region of the article in the first region and the second region, respectively;

Optionally, the processor 1110 is further configured to: the method comprises the steps of obtaining a first video of the internal space of a trunk from the opening to the closing of the trunk when a target object reaches a vehicle, and obtaining a second video of the internal space of the trunk from the opening to the closing of the trunk after the target object leaves the vehicle;

In an exemplary embodiment, a storage medium comprising instructions, such as a memory comprising instructions, executable by the processor 1110 of an electronic device to perform the in-vehicle item monitoring method described above is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In the embodiment of the present invention, in addition to the electronic device described in fig. 12, the structure of the electronic device may also be as shown in fig. 12, where the electronic device 1100 includes: radio Frequency (RF) circuit 1210, power supply 1220, processor 1230, memory 1240, input unit 1250, display unit 1260, communication interface 1270, and Wireless Fidelity (Wi-Fi) module 1280. Those skilled in the art will appreciate that the configuration of the electronic device shown in fig. 12 does not constitute a limitation of the electronic device, and the electronic device provided in the embodiments of the present application may include more or less components than those shown, or may combine some components, or may be arranged in different components.

The following describes each component of the electronic device 1100 in detail with reference to fig. 12:

the RF circuit 1210 may be used for receiving and transmitting data during a communication or conversation. In particular, the RF circuit 1210, after receiving downlink data of a base station, sends the downlink data to the processor 1230 for processing; and in addition, sending the uplink data to be sent to the base station. Generally, the RF circuit 1210 includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like.

In addition, the RF circuit 1210 may also communicate with networks and other terminals through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), etc.

The Wi-Fi technology belongs to a short-distance wireless transmission technology, and the electronic device 1100 can be connected to an Access Point (AP) through a Wi-Fi module 1280, thereby realizing Access to a data network. The Wi-Fi module 1280 may be used for receiving and transmitting data during communication.

The electronic device 1100 may be physically connected to other terminals via the communication interface 1270. Optionally, the communication interface 1270 is connected to the communication interface of the other terminal through a cable, so as to implement data transmission between the electronic device 1100 and the other terminal.

In this embodiment of the application, the electronic device 1100 is capable of implementing a communication service to send information to other contacts, so that the electronic device 1100 needs to have a data transmission function, that is, the electronic device 1100 needs to include a communication module inside. Although fig. 12 shows communication modules such as the RF circuit 1210, the Wi-Fi module 1280, and the communication interface 1270, it is to be understood that at least one of the above components or other communication modules (e.g., bluetooth modules) for enabling communication may be present in the electronic device 1100 for data transmission.

For example, when the electronic device 1100 is a computer, the electronic device 1100 may include the communication interface 1270 and may further include the Wi-Fi module 1280; when the electronic device 1100 is a tablet computer, the electronic device 1100 may include the Wi-Fi module.

The memory 1240 may be used to store software programs and modules. The processor 1230 executes the software programs and modules stored in the memory 1240 so as to execute various functional applications and data processing of the electronic device 1100, and the processor 1230 according to the embodiment of the present invention may execute the instructions in the processor 1110 in fig. 11, so that when the processor 1230 executes the program codes in the memory 1240, part or all of the processes in fig. 5 according to the embodiment of the present invention can be implemented.

Alternatively, the memory 1240 may mainly include a program storage area and a data storage area. The storage program area can store an operating system, various application programs (such as communication application), a face recognition module and the like; the storage data area may store data (such as various multimedia files like pictures, video files, etc., and face information templates) created according to the use of the terminal, etc.

Further, the memory 1240 may include high-speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The input unit 1250 may be used to receive numeric or character information input by a user and generate key signal inputs related to user settings and function control of the electronic device 1100.

Alternatively, the input unit 1250 may include a touch panel 1251 and other input terminals 1252.

The touch panel 1251, also referred to as a touch screen, can collect touch operations performed by a user on or near the touch panel 1251 (for example, operations performed by the user on or near the touch panel 1251 using any suitable object or accessory such as a finger or a stylus), and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 1251 may include two parts, namely, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 1230, and can receive and execute commands sent by the processor 1230. In addition, the touch panel 1251 can be implemented by various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave.

Optionally, the other input terminals 1252 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 1260 may be used to display information input by or provided to the user and various menus of the electronic device 1100. The display unit 1260 is a display system of the electronic device 1100, and is configured to present an interface to implement human-computer interaction.

The display unit 1260 may include a display panel 1261. Alternatively, the Display panel 1261 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

Further, the touch panel 1251 can cover the display panel 1261, and when the touch panel 1251 detects a touch operation on or near the touch panel 1251, the touch panel 1251 transmits the touch operation to the processor 1230 to determine the type of the touch event, and then the processor 1230 provides a corresponding visual output on the display panel 1261 according to the type of the touch event.

Although in fig. 12 the touch panel 1251 and the display panel 1261 are implemented as two separate components to implement the input and output functions of the electronic device 1100, in some embodiments, the touch panel 1251 and the display panel 1261 may be integrated to implement the input and output functions of the electronic device 1100.

The processor 1230 is a control center of the electronic device 1100, connects each component using various interfaces and lines, and performs various functions of the electronic device 1100 and processes data by operating or executing software programs and/or modules stored in the memory 1240 and calling data stored in the memory 1240, thereby implementing various services based on the terminal.

Optionally, the processor 1230 may include one or more processing units. Optionally, the processor 1230 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application program, and the like, and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 1230.

The electronic device 1100 also includes a power supply 1220 (such as a battery) for powering the various components. Optionally, the power source 1220 may be logically connected to the processor 1230 through a power management system, so that the power management system may manage charging, discharging, power consumption, and the like.

An embodiment of the present invention further provides a computer program product, which, when running on an electronic device, enables the electronic device to execute any one of the above in-vehicle item monitoring methods according to the embodiments of the present invention.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. An in-vehicle article monitoring method, comprising:

if the similarity between the first area and the second area is larger than a second preset value, determining that the object of the target object is left in the vehicle, and informing the target object according to a preset alarm mode;

the searching for the first area with the similarity smaller than the first preset value from the second in-vehicle image comprises:

determining position information of the second in-vehicle image corresponding to the feature elements with the similarity smaller than a first preset value, and determining a first area according to the position information;

determining a similarity between the first region and the second region by:

2. The in-vehicle item monitoring method of claim 1, wherein the determining a first area from the location information comprises:

if there is an overlapping area in the second in-vehicle image for the position information corresponding to the plurality of feature elements, an area formed by the position information corresponding to the plurality of feature elements is set as a first area.

3. The in-vehicle object monitoring method according to claim 1, wherein the training process of the feature extraction network comprises:

4. The in-vehicle item monitoring method according to claim 3, wherein adjusting parameters in the basic neural network by the N image feature vectors comprises:

if the sample images corresponding to the N image feature vectors are the same type of images, inputting the N image feature vectors into a first loss function to obtain a similarity value, and adjusting parameters in the basic neural network by using the similarity value; the images of the same type are images with the same scene in the vehicle; the first loss function is a function for determining the similarity degree of the sample images; or

5. The in-vehicle item monitoring method according to any one of claims 1 to 4, further comprising:

6. An electronic device, comprising: a memory and a processor;

the processor is specifically configured to: extracting a first image feature vector from the first in-vehicle image and a second image feature vector from the second in-vehicle image through a feature extraction network;

determining position information of the image in the second vehicle corresponding to the feature elements with the similarity smaller than a first preset value, and determining a first area according to the position information;

7. The electronic device of claim 6, wherein the processor is further configured to: the method comprises the steps of obtaining a first video of the internal space of a trunk from the opening to the closing of the trunk when a target object reaches a vehicle, and obtaining a second video of the internal space of the trunk from the opening to the closing of the trunk after the target object leaves the vehicle;

8. A storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the in-vehicle article monitoring method according to any one of claims 1 to 5.