Detailed Description
The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and not limiting of the application. It should be noted that, for convenience of description, only the portions related to the application are shown in the drawings.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
The carriage three-dimensional information calculation method provided by the embodiment of the application can be suitable for calculating the carriage loading rate in a logistics transportation scene. For example, in places such as a distribution point, a transfer site or a network site in the logistics industry, when the transportation vehicle is used for transferring, collecting or bulk cargo of the goods to be transported, the three-dimensional image of the carriage is generated, and the loading rate of the transportation vehicle is calculated in a statistics manner, so that the transportation cost and other businesses are controlled.
It can be understood that in the places such as the distributed points, the intermediate transfer points or the network points, etc., the carriage information acquisition device of the transport vehicle can be arranged, for example, cameras are arranged at the entrance or exit positions of the distributed points or the intermediate transfer points, so as to acquire the two-dimensional carriage images of the transport vehicle. And the data collected by each camera is accessed to a server, and a behavior monitoring system is instantiated for each path of video so as to process and generate a three-dimensional image corresponding to the carriage image in the path of video information for calculating the loading rate.
It can be further understood that in the method for generating a three-dimensional image of a vehicle cabin provided in the embodiment of the present application, two models, that is, a vehicle cabin classification model and an antagonism network model (Gan network model) may be generated by training in advance. After the carriage image of the vehicle to be tested is input into the carriage classification model, the vehicle model corresponding to the carriage image can be output, and after the vehicle model is input into the generated countermeasure network model, the inner wall position information of the carriage corresponding to the carriage image can be output.
It will also be appreciated that prior to performing the method, a state machine may need to be initialized, for example, to initialize the current test state, to initialize the previous state, and to start and stop the loading and unloading actions.
For easy understanding and explanation, the method and apparatus for generating a three-dimensional image of a vehicle cabin according to the embodiments of the present application are described in detail below with reference to fig. 1 to 8.
Fig. 1 is a schematic flow chart of a method for generating a three-dimensional image of a vehicle cabin according to an embodiment of the present application, as shown in fig. 1, the method may include:
s110, acquiring a carriage image set.
Specifically, in the method for generating the carriage three-dimensional image provided by the embodiment of the application, after the video stream of the carriage of the vehicle to be detected is acquired through the camera, a frame of picture in the video stream can be intercepted, and the image set of the carriage of the vehicle to be detected is obtained through processing.
S120, judging whether the carriage image in the carriage image set contains a complete carriage.
Specifically, the obtained carriage image of each frame may be judged to determine that it contains a complete carriage. If the car image is complete, the car image can be preprocessed after the car image containing the complete car is determined.
And S130, preprocessing the carriage image containing the complete carriage.
Specifically, the frame of the car image may be first extended, for example, the left and right sides of the car frame are respectively extended by a width of a specified value, and the up and down sides of the car frame are respectively derived from the height of the specified value, so that the car image may include a complete car. The specified value may be, for example, 0.05cm.
Further, the center region of the vehicle cabin image may be subjected to masking processing, and as shown in fig. 4, the center region of the vehicle cabin image may be covered with black masks each having a length of 9cm in width and height.
It can be understood that, for the obtained carriage image of the vehicle to be tested, as the goods are contained in the carriage image, after the goods are shielded by the mask cover, the complete morphological color characteristics of each side of the carriage frame can be obtained, and the goods can be uniformly treated by the mask. After processing, the total number of pixel points actually needed to be calculated by the carriage picture sent into the model becomes:
1.1^ 2 ×w×h-0.9^ 2 ×w×h=0.4×w×d
the calculation is reduced by sixty percent and the cargo disturbance is reduced by ninety-five percent. The problem complexity is reduced, the calculation resources are saved, and the algorithm robustness is improved.
And S140, inputting the processed carriage image into a pre-established carriage classification model and generating an countermeasure network model, and respectively outputting a vehicle model corresponding to the carriage image and a carriage inner wall estimation result.
After the preprocessing described above is performed on the acquired cabin image, the cabin image may be input to the cabin classification model and the Gan network model, respectively.
The carriage classification model outputs the vehicle type corresponding to the carriage image, such as minibus, small car, medium car, large car, oversized car and closing door. And outputting a carriage inner wall estimation result corresponding to the carriage image by the Gan network model. For example, the pixel coordinate values of the four vertices of the inner wall of the vehicle cabin corresponding to the vehicle cabin image can be output.
S150, judging whether the vehicle type and the estimation result of the inner wall of the carriage are in an error range.
S160, fusing the carriage image and the carriage inner wall estimation result.
Specifically, for each car image, after the classification model and the Gan network model output results, it is possible to compare and determine whether the two output results are within the error range. If the error is within the error range, the car image and the car inner wall estimation result can be fusion processed.
When the car classification model outputs the car model corresponding to the current car image, the standard size of the car corresponding to the car image can be determined according to the car model. And the predicted cabin size can be determined according to the prediction result of the cabin inner wall output by the Gan network model, such as the pixel coordinates of the peak of the cabin inner wall. And comparing the standard size and the predicted size of the carriage, and if the standard size and the predicted size are in an error range, indicating that the carriage classification model is consistent with the prediction result of the Gan network model, and obtaining the predicted vehicle type of the carriage image based on the predictions of the two models.
For example, if a car image is output by the car classification model, the size of the car corresponding to the car image can be obtained according to the standard size of the car. And outputting a result by using the Gan network model, calculating to obtain the carriage size corresponding to the carriage image, and comparing the error values of the two sizes to determine whether the two prediction results are in an error range, and if so, indicating that the vehicle type of the vehicle to be detected is a middle vehicle.
It can be understood that the predicted vehicle type of each carriage image is determined by inputting each frame of carriage image in the carriage image set into the carriage classification model and the Gan network model, outputting the corresponding results, and comparing the output results of the two models, so as to obtain the vehicle type set.
And after obtaining the predicted vehicle type set of each carriage image in the carriage image set, carrying out result fusion. Specifically, the classification model may be determined to output the final vehicle model according to the mode and/or confidence index of the vehicle model set. For example, if the vehicle type set includes three small vehicles, three medium vehicles and four large vehicles, the final vehicle type output by the classification model is represented as the large vehicle. And further, the standard carriage size corresponding to the final vehicle type can be determined based on the final vehicle type, and the pixel coordinate values of the four vertexes of the carriage frame are estimated according to the standard carriage size according to the carriage inner wall vertex coordinates.
S170, inputting the result of the fusion processing into a generation model to generate a three-dimensional image of the carriage.
Specifically, after the inner wall prediction result and the outer frame prediction result of the vehicle compartment to be measured are obtained, the results may be input into the generation model, and the three-dimensional image of the vehicle compartment may be output.
For example, as shown in fig. 6, after four vertex pixel coordinates of the outer frame and four vertex pixel coordinates of the inner wall of the vehicle compartment to be measured are determined, a three-dimensional image of the compartment may be created through multi-view geometric transformation.
For a better understanding of the method for generating a three-dimensional image of a vehicle cabin according to the present application, a method according to another embodiment of the present application is described in detail with reference to fig. 2.
Fig. 2 illustrates a method for generating a three-dimensional image of a cabin according to another embodiment of the present application, the method 200 may include:
s201, acquiring a training sample set, wherein the training sample set comprises at least one frame of empty car image.
S202, preprocessing an empty carriage image;
s203, training the processed empty car image to obtain a generated countermeasure network model.
Specifically, when the method is executed, empty car images of all the car types shown in fig. 3 can be acquired as a training sample set, and the training sample set comprises at least one frame of empty car images. After the empty car images are obtained, each empty car image can be preprocessed, and finally the processed empty car images are trained to generate the Gan network model.
For example, in order to obtain a complete car image, the acquired empty car image may be extended up, down, left, and right. After the extension, the central area of the empty carriage image can be subjected to black masking treatment so as to be consistent with the carriage image of the vehicle to be detected acquired at a later stage.
Further, as shown in fig. 5, the polygon where the inner wall of the car is located in the empty car image needs to be filled, i.e. covered with different colors, to show the inner wall position of the empty car image, such as a red polygon. When the vehicle compartment image to be detected is input into the Gan network model for prediction, a polygon which is the same as the polygon on the empty vehicle compartment image can be generated in the center of the vehicle compartment image through multiple times of training, namely, the prediction of the inner wall position of the vehicle compartment to be detected is realized by the Gan network model.
S204, acquiring a vehicle image frame sequence.
S205, the similarity between adjacent frames is calculated.
S206, judging whether the similarity of the adjacent frames is larger than a threshold value.
S207, acquiring a later frame of the adjacent frames as a carriage image, and performing online data augmentation processing on the carriage image to obtain the carriage image set.
Specifically, after a vehicle to be tested enters a camera monitoring range, the image acquisition of the carriage is carried out, namely, a vehicle image frame sequence is acquired. When the server acquires the image frame sequence of the vehicle to be detected, the similarity (SSIM) between adjacent frames can be calculated. And comparing the similarity, if the calculated SSIM is smaller than or equal to a first threshold value, such as 0.95, the specification vehicle to be tested is displaced, and then a frame next to the adjacent frame can be intercepted to serve as a carriage image.
It will be appreciated that if the similarity is greater than the first threshold, the current frame is skipped and the similarity between the next and previous frames continues to be calculated.
After the carriage image is obtained, the carriage image can be subjected to online data augmentation processing, such as rotation, cutting, color mixing and the like, so that a carriage image set can be obtained. If the acquired carriage images are subjected to online data augmentation processing, a set comprising 7 carriage images is obtained.
S208, judging whether each frame of carriage image in the carriage image set contains a complete carriage.
Specifically, the obtained carriage image of each frame may be judged to determine that the carriage image is a carriage image, and it is determined that the carriage image contains a complete carriage. If the license plate number is included, whether the acquired image is a carriage image can be determined, whether the carriage image is complete can be determined by including a carriage frame, and after the carriage image is determined to contain a complete carriage, the carriage image can be preprocessed. If it is judged that the complete car image is found to be contained, S29 may be performed, otherwise, the car image is discarded.
And S209, preprocessing the carriage image.
S210, inputting the processed carriage image into a pre-established carriage classification model and the generated countermeasure network model, and respectively outputting a vehicle model corresponding to the carriage image and a carriage inner wall estimation result.
Specifically, the processing procedure is the same as S130 and S140 in the above embodiment, and will not be repeated here.
It can be understood that after the processed car image is input into the Gan network model, the center of the car image is generated into an inner wall polygon after training of the Gan network model, and the inner wall polygon is a car inner wall estimation result corresponding to the car image. For example, the four vertices of the inner wall polygon are pixel coordinates of the four vertices of the inner wall of the vehicle corresponding to the vehicle image.
S211, judging whether the vehicle type and the estimation result of the inner wall of the carriage are in an error range.
S212, the carriage image and the carriage inner wall estimation result are fused.
S213, inputting the result of the fusion processing into a generation model to generate a three-dimensional image of the carriage.
It is understood that the implementation process of the above steps is similar to S150, S160 and S170, and will not be repeated here.
It is also understood that when it is determined in S211 that the vehicle model and the estimation result of the vehicle interior wall are not within the error range, the current vehicle interior image may be discarded. For example, if the vehicle model output by the vehicle classification model is a trolley, the standard size of the trolley is compared with the estimated size obtained by calculating the estimated result of the inner wall output by the Gan network model, and the range of the found error is relatively large, for example, if the vehicle size corresponding to the estimated result output by the Gan network model is a trolley. It is stated that the results of the two prediction model outputs are contradictory, and the current car image may be discarded.
Fig. 7 is a schematic structural diagram of a three-dimensional image generating device for a vehicle cabin according to an embodiment of the present application, as shown in the drawing, the device 700 may include:
a first acquisition module 710 is configured to acquire a set of car images.
A first determining module 720 is configured to determine whether the car image in the car image set includes a complete car.
A first processing module 730 for preprocessing the car image containing the complete car.
The input module 740 is configured to input the processed car image into a pre-established car classification model and generate an countermeasure network model, and output a model corresponding to the car image and an estimation result of the inner wall of the car respectively.
The second determining module 750 is configured to determine whether the vehicle model and the estimation result of the inner wall of the vehicle cabin are within an error range.
And the fusion module 760 is configured to fuse the vehicle image and the vehicle inner wall estimation result when the vehicle model and the vehicle inner wall estimation result are within an error range.
The generating module 770 is configured to input the result of the fusion processing to a generating model and generate a three-dimensional image of the cabin.
Preferably, the device for generating a three-dimensional image of a carriage provided in the embodiment of the present application further includes:
a second obtaining module 701 is configured to obtain a training sample set, where the training sample set includes at least one empty car image.
The training module 702 is configured to train the processed empty car image to obtain a generated countermeasure network model; wherein,,
the first processing module 730 is further configured to perform preprocessing on the empty car image.
Preferably, in the device for generating a three-dimensional image of a vehicle cabin provided in the embodiment of the present application, the first processing module 730 is specifically configured to:
and carrying out extension processing on the carriage image or the carriage frame of the empty carriage image.
And masking the central area of the carriage image or the empty carriage image.
Preferably, in the vehicle cabin three-dimensional image generating device provided in the embodiment of the present application, the vehicle cabin inner wall estimation result includes pixel coordinate values of four vertices of the inner wall of the vehicle cabin image.
Preferably, in the three-dimensional image generating device for a vehicle cabin provided in the embodiment of the present application, the fusion module 760 is specifically configured to:
and acquiring the classification model and outputting a vehicle model set of the carriage image.
And determining the classification model according to the mode and/or confidence index of the vehicle model set to output a final vehicle model.
And determining pixel coordinate values of four vertexes corresponding to the frame of the carriage by utilizing the final vehicle type and the estimation result of the inner wall of the carriage.
Preferably, in the three-dimensional image generating device for a vehicle cabin provided in the embodiment of the present application, the fusion module 760 is specifically configured to:
and determining the standard carriage size corresponding to the final vehicle type based on the final vehicle type.
And estimating pixel coordinate values of four vertexes of the frame according to the standard carriage size based on the carriage inner wall vertex coordinates.
Preferably, the device for generating a three-dimensional image of a carriage provided in the embodiment of the present application further includes:
a third acquisition module 703 is configured to acquire a vehicle image frame sequence.
A calculating module 704, configured to calculate a similarity between adjacent frames.
The fourth obtaining module 705 obtains a frame subsequent to the adjacent frame as a car image if the similarity is less than or equal to a first threshold.
The second processing module 706 performs online data augmentation processing on the car image to obtain the car image set.
Preferably, in the apparatus for generating a three-dimensional image of a vehicle cabin provided in the embodiment of the present application, the generating module 770 is specifically configured to perform multi-view geometric transformation by using pixel coordinate values of four vertices corresponding to a frame of the vehicle cabin and the estimation result of the inner wall of the vehicle cabin, so as to establish a three-dimensional image of the vehicle cabin.
Preferably, in the device for generating a three-dimensional image of a vehicle cabin provided in the embodiment of the present application, the first judging module is specifically configured to:
and judging whether the carriage image comprises a carriage or not.
If yes, judging whether the carriage image is complete.
If not, the car image is discarded.
Preferably, the three-dimensional image generating device for a vehicle cabin provided in the embodiment of the present application further includes a first discarding module 707, configured to discard the image for the vehicle cabin when the second judging module judges that the vehicle model and the estimation result of the inner wall of the vehicle cabin are not within the error range.
Preferably, the three-dimensional car image generating device provided in the embodiment of the present application further includes a second discarding module 708, configured to discard the car image when the first determining module determines that each frame of the car image in the car image set does not include a complete car.
In another aspect, an embodiment of the present application further provides a server, including a memory, a processor, and a computer program stored on the memory and capable of running on the processor, where the processor executes the program to implement a method for generating a three-dimensional image of a vehicle cabin as described in fig. 1 or fig. 2.
Referring now to FIG. 8, there is illustrated a schematic diagram of a computer system 800 suitable for use in implementing a server of an embodiment of the present application.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU) 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 808. In RAM 808, various programs and data required for the operation of system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 808 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.
In particular, according to embodiments of the present disclosure, the process described above with reference to fig. 1 may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method of fig. 1. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various vehicle cabin three-dimensional image generation embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present application may be implemented by software, or may be implemented by hardware. The described units or modules may also be provided in a processor, for example, as: the processor comprises a first acquisition module, a first judgment module, a first processing module, an input module, a second judgment module, a fusion module and a generation module. The names of the units or modules do not limit the units or modules themselves in some cases, and for example, the generation module may also be described as "a module for inputting the result of the fusion processing into a generation model to generate a three-dimensional image of the vehicle cabin".
As another aspect, the present application also provides a computer-readable storage medium, which may be a computer-readable storage medium contained in the foregoing apparatus in the foregoing embodiment; or may be a computer-readable storage medium, alone, that is not assembled into a device. The computer-readable storage medium stores one or more programs for use by one or more processors in performing the three-dimensional image generation of the cabin described in the present application.
In summary, according to the method, the device, the server and the storage medium for generating the three-dimensional image of the carriage provided by the embodiment of the application, the method is characterized in that by acquiring the carriage image set of the vehicle to be tested, preprocessing each frame of carriage image containing the complete carriage, inputting the processed carriage image into the classification model and the GAN network module to obtain the estimation result of the inner wall of the carriage of the vehicle to be tested, and inputting the obtained estimation result of the inner wall into the generation model to generate the three-dimensional image of the carriage of the vehicle to be tested, so that the complexity of generating the three-dimensional image of the carriage is reduced, the calculation amount in the generation process of the three-dimensional image of the carriage is reduced, and the calculation efficiency of the carriage loading rate is improved.
The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by those skilled in the art that the scope of the application referred to in this application is not limited to the specific combination of features described above, but also covers other technical solutions formed by any combination of features described above or their equivalents without departing from the spirit of the application. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.