CN114786036B

CN114786036B - Method and device for monitoring automatic driving vehicle, storage medium and computer equipment

Info

Publication number: CN114786036B
Application number: CN202210205051.2A
Authority: CN
Inventors: 黄超; 柴桢亮
Original assignee: Shanghai Xiantu Intelligent Technology Co Ltd
Current assignee: Shanghai Xiantu Intelligent Technology Co Ltd
Priority date: 2022-03-02
Filing date: 2022-03-02
Publication date: 2024-03-22
Anticipated expiration: 2042-03-02
Also published as: CN114786036A

Abstract

A monitoring method and apparatus, a storage medium, and a computer device for an autonomous vehicle, the autonomous vehicle being provided with a work camera for outputting a coded image, the coded image being obtained by a coding operation of an acquired original image, the coding operation having a first delay, the method comprising: acquiring an original image acquired by the operation camera; performing video stream encoding operation on the original image to obtain an encoded video stream, wherein the video stream encoding operation has a second delay, and the second delay is smaller than the first delay; and sending the video stream to a server for remote monitoring. Therefore, the delay of monitoring based on the commercial monitoring camera arranged on the unmanned vehicle can be effectively reduced, and particularly, the influence caused by the delay can be reduced during remote monitoring.

Description

Method and device for monitoring automatic driving vehicle, storage medium and computer equipment

Technical Field

The invention relates to the field of automatic driving, in particular to a monitoring method and device for an automatic driving vehicle, a storage medium and computer equipment.

Background

The unmanned vehicle relies on the sensor to perceive road conditions and surrounding environment, and the camera is an important visual sensor of the unmanned vehicle. Cameras on unmanned vehicles mainly include a sensing camera and a commercial monitoring camera (also referred to as a work camera). The sensing camera is mainly used for collecting image data to participate in an unmanned driving algorithm; the commercial monitoring camera is mainly used for collecting image data to record and monitor the running condition and the work completion condition of the vehicle.

The image that commercial surveillance camera head gathered output still can be used for remote monitoring generally, among the prior art, and the remote monitoring technical scheme that uses mainly includes following two kinds:

1. and acquiring each frame of original image of the commercial monitoring camera, transmitting the data of each frame of original image to an intermediate server through websocket protocol (a protocol for full duplex communication on a single TCP connection) in a binary network packet, and packaging the data into a file (such as jpg file) in a picture format in the intermediate server. The server side performing remote monitoring acquires the file of each frame of original image from the intermediate server in a manner of accessing the image file, and plays the continuous multi-frame original image on the server side in a form similar to a picture.

2. The video stream (which may be in RTSP format) output by the commercial monitoring camera is obtained, and the video stream is transcoded into a Real-time transmission protocol (Real-time Transport Protocol, abbreviated as RTMP, also referred to as live broadcast protocol) and pushed to a video processing center of the intermediate server. And after the server side executing remote monitoring requests the intermediate server, the RTMP stream is pulled from the intermediate server to play the video.

However, the two remote monitoring solutions have respective drawbacks and disadvantages. The solution 1 described above does not have inter-coding of the original image, and requires transmission of the complete original image, resulting in greater transmission bandwidth consumption at the same transmission frame rate. Bandwidth congestion can occur if the bandwidth consumption is large to some extent, which in turn causes an avalanche effect to cause a large delay. If the pressure of the transmission bandwidth and the delay of the transmission are required to be reduced, only the frame number of the transmitted image can be reduced, which can lead to a reduced frame rate of remote playing, and a low playing frame rate for a person watching the monitoring video at the server side means that the video is jammed, which may bring about a negative experience of monitoring.

In summary, in the above two existing remote monitoring solutions, the delay of remote monitoring is often large, so as to affect the effect of remote monitoring.

Disclosure of Invention

The invention solves the technical problem of how to effectively reduce the remote monitoring delay of a commercial monitoring camera on an unmanned vehicle.

To solve the above technical problem, an embodiment of the present invention provides a method for monitoring an automatic driving vehicle, where the automatic driving vehicle is provided with a working camera, the working camera is configured to output a coded image, the coded image is obtained by performing a coding operation on an acquired original image, the coding operation has a first delay, and the method includes: acquiring an original image acquired by the operation camera; performing video stream encoding operation on the original image to obtain an encoded video stream, wherein the video stream encoding operation has a second delay, and the second delay is smaller than the first delay; and sending the video stream to a server for remote monitoring.

Optionally, in the video stream encoding operation, B-frame encoding is canceled and/or the number of image frames in a single group of pictures is less than a first threshold.

Optionally, when the video stream is sent, the size of a buffer area used for sending the video stream is smaller than a second threshold value, and/or the video stream is sent through a UDP protocol.

Optionally, after the acquiring the original image acquired by the operation camera, the method further includes: generating a job message according to the original image; the job message is published to enable one or more nodes to obtain the job message.

Optionally, the autonomous vehicle is configured with a robot operating system, and the issuing the job message includes: and publishing the job message to the robot operating system network so that the one or more nodes acquire the job message from the network in a subscription mode.

Optionally, after the issuing of the job message, the method further includes: acquiring the job message through a local monitoring node, and generating a job image stream based on the job message; and sending the operation image stream to a playing device for playing through the local monitoring node so as to enable a safety officer on the automatic driving vehicle to realize local monitoring.

Optionally, the automatic driving vehicle is further provided with a sensing camera, the sensing camera is used for collecting sensing images, and the sensing images participate in an automatic driving algorithm.

Optionally, the method further comprises: when the automatic driving vehicle generates a take-over event, determining the occurrence time of the take-over event; acquiring a perceived image within a period of time before the occurrence time; acquiring an original image within a period of time before the occurrence time; and sending the take-over event, the perceived image and the original image which are in a period of time before the occurrence time to the server.

The embodiment of the invention also provides a monitoring device of an automatic driving vehicle, the automatic driving vehicle is provided with a working camera, the working camera is used for outputting a coded image, the coded image is obtained by a coding operation of an acquired original image, the coding operation has a first delay, and the device comprises: the operation image acquisition module is used for acquiring an original image acquired by the operation camera; the encoding module is used for executing video stream encoding operation on the original image to obtain an encoded video stream, and the video stream encoding operation has a second delay which is smaller than the first delay; and the sending module is used for sending the video stream to a server for remote monitoring.

Embodiments of the present invention also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of monitoring an autonomous vehicle of any of the above.

The embodiment of the invention also provides computer equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the monitoring method of the automatic driving vehicle when executing the computer program.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a monitoring method of an automatic driving vehicle, wherein the automatic driving vehicle is provided with a working camera, the working camera is used for outputting a coded image, the coded image is obtained by a coding operation of an acquired original image, the coding operation has a first delay, and the method comprises the following steps: acquiring an original image acquired by the operation camera; performing video stream encoding operation on the original image to obtain an encoded video stream, wherein the video stream encoding operation has a second delay, and the second delay is smaller than the first delay; and sending the video stream to a server for remote monitoring. Compared with the prior art, by adopting the method of the embodiment, the automatic driving vehicle acquires the original image which is not internally encoded by the operation camera, obtains the video stream after externally encoding the original image, and then pushes the video stream to the server. Therefore, when the automatic driving vehicle sends the monitoring video stream collected by the operation camera to the server, the delay of remote monitoring can be effectively reduced.

Further, the vehicle locally performs inter-frame coding on the original image after acquiring the original image, encapsulates the coded data into FLV streaming media and then sends the FLV streaming media to the server, and compresses the original image. Compared with the prior art, the scheme can reduce the bandwidth consumption during transmission, and does not reduce the number of transmitted image frames, so that the staff at the server side has better monitoring experience. In addition, the delay caused by inter-frame coding is smaller than the delay of the internal coding of the operation camera, so that the delay of a remote monitoring scene can be effectively reduced.

Further, the data of the original image, the perceived image and the perceived information acquired by the operation camera, the perceived camera and various perceiving devices are converted into a unified data format. Therefore, the data can be managed based on the same data management mode, and the data analysis is convenient.

Further, in the local monitoring scene of the vehicle, the automatic driving vehicle side plays the operation image stream (such as the MJPEG image stream) generated by the original image collected by the operation camera, the calculation consumption of video coding is avoided when the original image is converted into the operation image stream, the data of the original image in the local monitoring scene is always kept at the local of the vehicle side, no network transmission is needed, the network traffic and the consumption of network bandwidth in transmission are not needed to be considered, and therefore, the delay is very low, and the playing frame rate is high.

Drawings

FIG. 1 is a schematic diagram of a method for monitoring an autonomous vehicle according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for monitoring an autonomous vehicle according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of a monitoring method for an automatic driving vehicle according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a method for monitoring an autonomous vehicle according to an embodiment of the present invention;

fig. 5 is a schematic structural view of a monitoring device for an autonomous vehicle according to an embodiment of the present invention.

Detailed Description

As described in the background art, in the prior art, when remote monitoring is performed by using a commercial monitoring camera, there is often a problem of large delay. The inventor of the present application noted that the main reason for the above problem is that the commercial monitoring camera generally outputs in the form of a real-time streaming protocol (Real Time Streaming Protocol, RTSP) video stream after the original image is acquired and then the video stream is encoded inside the camera. However, there is a certain delay in the inner encoding operation (this delay is typically around 2 seconds (s)), which may cause the monitoring effect to be affected by the delay when the unmanned vehicle is monitored.

In addition, a safety member is generally equipped on all unmanned vehicles in the current stage, and the safety member needs to locally watch videos collected by cameras (including a sensing camera and a working camera), manually plan a predicted track of automatic driving vehicle control and the like so as to know the future driving behavior of the vehicle, and can also judge whether driving risks exist or not by combining with the environment of the vehicle site so as to prepare for interventional control of the vehicle in advance.

The safety officer watches the scene of the video collected by the camera on the vehicle terminal equipment, which is called a vehicle local monitoring scene, and the traditional video playing step under the scene comprises the following steps: the RTSP video stream output by the operation camera through internal coding is directly obtained, the video stream (comprising multi-frame images) is stored in a local disk of the vehicle, and the local playing equipment reads the stored video stream from the disk to play. Or, the image acquired by the sensing camera is stored in a local disk of the vehicle, and then the vehicle locally acquires the stored image from the disk to play the image frame by frame.

However, the inventor has found that in the existing local monitoring scenario of the vehicle, each frame of image of the video stream needs to be stored and read to the disk frequently, which can cause stress to an input/output (I/O) interface of the disk. In addition, if the images are read simultaneously during the writing of the same frame of image, the phenomena of screen display, incomplete image display and the like in the local monitoring can be caused, and the local monitoring experience of the safety officer is influenced.

To solve the problem, an embodiment of the present invention provides a method for monitoring an automatic driving vehicle, the automatic driving vehicle being provided with a working camera for outputting a coded image, the coded image being obtained by a coding operation of an acquired original image, the coding operation having a first delay, the method comprising: acquiring an original image acquired by the operation camera; performing video stream encoding operation on the original image to obtain an encoded video stream, wherein the video stream encoding operation has a second delay, and the second delay is smaller than the first delay; and sending the video stream to a server for remote monitoring. By adopting the method of the embodiment, the delay of monitoring based on the commercial monitoring camera arranged on the unmanned vehicle can be effectively reduced, and particularly, the influence caused by the delay can be reduced during remote monitoring.

In order to make the above objects, features and advantages of the present invention more comprehensible, embodiments accompanied with figures are described in detail below. The invention relates to the technical field of Internet, in particular to big data analysis and application of an electronic commerce platform.

In one embodiment, referring to fig. 1, fig. 1 is a schematic diagram of a monitoring method of an automatic driving vehicle 10, where one or more operation cameras 101 are further disposed on the automatic driving vehicle 10, and the operation cameras 101 are used for outputting encoded images, where the encoded images are obtained by performing an encoding operation on collected original images, and the encoding operation has a first delay.

Alternatively, the work camera 101 may be mounted on the exterior of the autonomous vehicle 10, such as on the roof, head, tail or side wall of the autonomous vehicle 10, for capturing images of the environment in which the autonomous vehicle 10 is located. For example, the work camera 101 is used to monitor the relative position between the travel route of the autonomous vehicle 10 and the road edge, or to monitor the work condition of the autonomous vehicle 10 or record the monitored vehicle travel condition.

Alternatively, the job condition is determined according to a job that the autonomous vehicle 10 is set to execute. For example, the operation that the autonomous vehicle 10 is set to perform is road surface cleaning, that is, the autonomous vehicle 10 is an unmanned sweeper, which may refer to an operation that the unmanned sweeper cleans a road surface. At this time, the operation camera 101 may be disposed at a roof or a tail of the vehicle for monitoring a road surface cleaned by the unmanned sweeper. Because the unmanned cleaning vehicle generally needs to travel close to the road edge when performing road cleaning operations, the operation camera 101 may be disposed on the roof, the tail or the side wall of the vehicle, and used to monitor the relative position between the travel route of the unmanned cleaning vehicle and the road edge, so as to determine whether the vehicle travels close to the road edge.

It should be noted that, the tasks set to be performed by the automatic driving vehicle 10 include, but are not limited to, road surface cleaning, and when the automatic driving vehicle 10 is set to perform other tasks, the positions set by the task cameras 101 and the monitored task conditions will also vary, which will not be described herein.

Alternatively, the operation camera 101 may be implemented by various commercial monitoring cameras (simply referred to as monitoring cameras), and in general, in a conventional use case, after an original image is collected by a current common monitoring camera, the original image needs to be encoded to obtain an encoded image, and then the encoded image is output. This coding may be referred to as inner coding, which introduces a certain time delay, denoted first delay. For monitoring scenarios with high real-time requirements, the first delay may result in failure to meet the monitoring requirements.

In one embodiment, a typical commercial monitoring camera, after capturing a plurality of consecutive frames of original images (for example, the original images may be true color images, or RGB images), internally encodes the plurality of frames of original images to obtain an RTSP video stream and outputs the RTSP video stream. The RTSP video stream includes multiple frames of coded images (i.e., coded images) that are consecutive, and the RTSP video stream can satisfy the requirement of the user for watching the monitor picture. Wherein, RTSP is the application layer protocol in the transmission control protocol/Internet protocol (Transmission Control Protocol/Internet Protocol, TCP/IP for short) protocol system.

The autonomous vehicle 10 may also be provided with a computing device 102, the computing device 102 being adapted to perform the method of monitoring an autonomous vehicle according to an embodiment of the invention. The computing device 102 may be disposed on the autonomous vehicle 10, such as integrated into an autonomous system. Computing device 102 may also be other devices communicatively coupled to autonomous vehicle 10.

Referring to fig. 2, fig. 2 is a flow chart of a method of monitoring an autonomous vehicle, which may be performed by a computing device on the autonomous vehicle. The method may include the following steps S201 to S203, each of which is described in detail below.

Step S201, acquiring an original image acquired by the operation camera.

Optionally, the operation camera may be implemented by using an existing monitoring camera, in this case, the monitoring camera may be developed for a second time to obtain an original image collected by the monitoring camera, where the original image is not subjected to internal coding, and compared with obtaining a coded image, obtaining the original image can avoid the first delay. Further, the software development kit (Software Development Kit, SDK for short) provided by the monitoring camera is utilized for secondary development. Alternatively, a dedicated job camera may be customized, which may output an encoded image of the original image after internal encoding, or may obtain an original image without internal encoding through a suitable form (e.g., a customized hardware or software interface).

Alternatively, the original image may be an image before being subjected to internal coding (for example, RTSP coding) in the monitoring camera. That is, in the step of capturing the image of the monitoring camera, the original image data before the internal coding is used, thereby avoiding the first delay due to the internal coding of the monitoring camera.

Step S202, performing a video stream encoding operation on the original image, to obtain an encoded video stream, where the video stream encoding operation has a second delay, and the second delay is smaller than the first delay.

In step S202, the computing device performs a video stream encoding operation on the acquired original image, which may be referred to as outer encoding, which introduces a second delay, which is smaller than the first delay, and the encoding algorithms of the outer encoding and the inner encoding are different. Alternatively, the compression ratio of outer coding is lower than inner coding, so the second delay is smaller than the first delay. In particular, the outer encoding is performed by a computing device in the autonomous vehicle (e.g., computing device 102 in fig. 1), on the one hand, delay may be reduced by optimization of the encoding algorithm; on the other hand, computing power of a computing device in an autonomous vehicle is often better than that of a work camera, so that the encoding delay is also facilitated to be reduced; in addition, hardware optimization of the computing device in the autonomous vehicle may be performed, for example, a more computationally intensive device may be selected, thereby further reducing encoding delay.

Step S203, sending the video stream to a server for remote monitoring.

Specifically, in step S203, the computing device sends the externally encoded video stream to the server, and the staff on the server side may play the encoded video stream to monitor the operation condition of the autopilot vehicle or the running condition of the vehicle. The monitoring of the operation condition of the automatic driving vehicle or the running condition of the vehicle by the service end can be called as a remote monitoring scene. Alternatively, when a worker at the service end finds that there is a job problem or a running risk of the vehicle, the autonomous vehicle may be remotely controlled by the service end, such as by performing emergency braking on the autonomous vehicle, or changing a running route of the autonomous vehicle, or the like.

In addition, after the server receives the video stream sent by the computing device, the server can also monitor the operation condition of the automatic driving vehicle or the vehicle running condition by carrying out image recognition on each frame image in the video stream.

By the method, the computing equipment on the side of the automatic driving vehicle acquires the original image which is not internally encoded by the operation camera, obtains a video stream after externally encoding the original image, and pushes the video stream to the server. Therefore, when the automatic driving vehicle sends the monitoring video stream collected by the operation camera to the server, the delay can be effectively reduced, and the reduced delay is the difference between the first delay and the second delay.

Further, in this embodiment, the original image acquired by the remote monitoring scene is not subjected to the inner encoding, that is, the delay caused by the inner encoding (i.e., the first delay) is not introduced, and the delay caused by the outer encoding (i.e., the second delay) is smaller than the delay of the inner encoding of the monitoring camera (i.e., the first delay). In the foregoing conventional solution 2, the delay caused by the inner code is introduced, so that the delay of the remote monitoring scene can be effectively reduced by the method of the embodiment.

It should be noted that, in the remote monitoring scenario, if the computing device directly sends the encoded image after the operation camera performs the internal encoding to the server, there is two-stage delay: the delay caused by the inner code (i.e., the first delay) and the delay caused by the data transfer between the computing device and the server.

In one embodiment, for step S202 in fig. 2, in the video stream encoding (i.e., outer encoding) operation, in order to minimize the delay caused by encoding, one or more of the following manners may be adopted: the Bi-predictive interpolation coded frame (Bi-directional interpolated prediction frame, B-frame for short) coding is canceled and the number of picture frames in a single group of pictures (Group of Pictures GOP for short) is configured to be less than a first threshold.

Wherein the number of picture frames contained in a single GOP may also be referred to as the size (size) of the GOP. Each GOP contains multiple consecutive pictures, and the first Frame of each GOP must be an Intra-Frame (I-Frame, also called Key-Frame) to ensure that the GOP can be independently encoded and decoded without reference to other pictures, which can be B-frames or Predictive-frames (P-frames).

Furthermore, the I frame picture is completely reserved, and when the I frame is encoded/decoded, only the image data of the frame is needed to complete. The P frame represents the difference between the present frame and the previous I frame (or P frame), and the previous buffered picture needs to be used to overlap the difference defined by the present frame during decoding, so as to generate the final picture, that is, the P frame has no complete picture data, and only has the data of the difference with the picture of the previous frame. The B frame is difference data between a current image frame (abbreviated as a current frame) and a frame previous to the current frame and a frame next to the current frame, and is to be encoded/decoded, not only to obtain a buffered picture of the frame previous to the current frame, but also to decode a picture of the frame previous to the current frame, and to obtain a final picture by superimposing the pictures of the frame previous to the current frame and the frame previous to the current frame with the current frame data.

In this embodiment, in the remote monitoring scenario, the video stream encoding operation is performed on the original image first, and then the encoded video stream is sent to the server. In this encoding operation, the number of image frames (also referred to as GOP size) in the single GOP that is originally set is reduced, and specifically may be set to be smaller than the first threshold value. Reducing GOP size can reduce the number of picture frames between adjacent I frames of a video stream, shorten the association between adjacent frames when encoding (or decoding) to reduce the delay of the encoding operation at the computing device side and reduce the delay of the corresponding decoding operation at the server side. In the most preferred case, the first threshold is set to 1, that is, each frame in the encoded video stream is an I-frame, so that each frame can be independently encoded and decoded, and delay in the encoding and decoding processes can be reduced to the greatest extent. However, the smaller the GOP size, the higher the proportion of I frames in the video stream, the greater the amount of data to be transmitted and the higher the transmission bandwidth requirements. Alternatively, the value of the first threshold may be determined based on the transmission bandwidth of the current computing device when transmitting data to the server and an acceptable delay in the remote monitoring scenario.

In addition, in the encoding operation, the present embodiment cancels B-frame encoding, but in the prior art, the common encoding operation will perform B-frame encoding, and the compression rate of B-frame encoding is high, but it needs to introduce delay of encoding at least three frames (i.e. current frame, previous frame and next frame) during encoding and decoding, and the resources are occupied during encoding. Therefore, the embodiment cancels B frame coding, and can effectively reduce delay and resource occupation in the coding and decoding processes.

In one embodiment, for step S202 in fig. 2, when the computing device sends the video stream to the server, the buffer for sending the video stream may also be referred to as an output buffer (flushing) or a sending buffer (flushing) with a size smaller than the second threshold. In addition, the computing device may send the video stream via a user datagram protocol (User Datagram Protocol, UDP) protocol to further reduce latency.

The computing device sends the video stream to the server, and outputs the video stream to the server after the sending buffer area is full, so that the sending buffer area can be reduced to reduce the buffer duration of video playing of the server, and delay can be reduced, but certain fluency of playing of the server is sacrificed at the moment, so that the requirement of monitoring instantaneity in a remote monitoring scene is met. The buffer area for transmitting the video stream is reduced, so that the time for each data copy can be reduced, and the delay for transmitting the video stream can be shortened. Further, the size of the buffer for transmitting the video stream is set to be smaller than the second threshold.

In addition, the transmission speed of the UDP protocol is faster and the transmission delay is smaller than that of the TCP protocol. Optionally, in the embodiment of the present invention, the video live broadcast protocol may be implemented based on a commonly used video live broadcast protocol developed based on a UDP protocol, for example, a protocol header ARTC used in a low-delay live broadcast solution provided by the oricloud may be used to implement video streaming between a computing device and a server, and video live broadcast may be performed at the server.

In general, when the video streaming between the computing device and the server is implemented by using the ARTC, the sending buffer area is a value preset by a third party (for example, the ari cloud), and the embodiment of the invention can also autonomously configure the pushing end of the ARTC, so that the sending buffer area is reduced, and the transmission delay is further reduced.

In a specific embodiment, in the video stream encoding operation, B-frame encoding is canceled and the number of image frames in a single group of pictures is less than a first threshold; when the video stream is transmitted, the size of a buffer area for transmitting the video stream is smaller than a second threshold value, and the video stream is transmitted through a UDP protocol. Therefore, delay is reduced to the minimum in each step of video encoding, transmission and decoding, so that low-delay real-time monitoring live broadcast is realized at a server side.

In a specific embodiment, the outer encoding operation may include the steps of: and performing color gamut space conversion on each frame of RBG image (RGB image is a common format of original image) to obtain a YUV color coded image. Where "Y" represents brightness (Luminance or Luma), i.e., gray scale values, "U" and "V" represent chromaticity (Chroma or Chroma) for describing the color and saturation of an image for specifying the color of a pixel. And converting the multi-frame continuous YUV color coded images into a digital Video compression format, and packaging the digital Video compression format into an adaptive Video (namely an externally coded Video stream) in a streaming media (FLV) format. The digital video compression format may be h.264 or h.265.

In this embodiment, the original image is subjected to operations such as format conversion and inter-frame coding, and the coded data is encapsulated into a video stream in FLV format, and in the 1 st solution described in the background technology, the corresponding solution is to transmit the data of each frame of original image to an intermediate server in a binary network packet, and then obtain the data from the server. Because the video stream in the FLV format sent in the embodiment has inter-frame coded the original image, there is no need to transmit the complete original image, so that bandwidth consumption during transmission can be reduced, and the number of frames of the transmitted image is not reduced, so that a staff at the server can have better monitoring experience.

In one embodiment, referring again to fig. 1, the autonomous vehicle 10 may further be provided with one or more sensing cameras 103, wherein the sensing cameras 103 are configured to collect sensing images, and the sensing images participate in an autonomous driving algorithm, that is, the autonomous driving algorithm performs an operation related to autonomous driving based on the sensing images and generates control instructions for autonomous driving.

A sensing camera 103 is configured on the automatic driving vehicle 10, an original image acquired by the sensing camera 103 can be acquired by a sensing module on the automatic driving vehicle 10, and the sensing module operates an automatic driving algorithm based on the original image acquired by the sensing camera 103 and one or more sensing devices 104 (the sensing devices 104 may include a gravity sensor, an ultrasonic sensor, an accelerometer and the like) arranged on the automatic driving vehicle 10, so as to realize automatic driving of the vehicle.

When the autonomous vehicle 10 is used to perform other tasks (e.g., cleaning a road surface, etc.), one or more task cameras 101 may be mounted on the autonomous vehicle, and the raw images acquired by the task cameras 101 are not involved in the autonomous algorithm, but are used to monitor the operational condition of the autonomous vehicle 10 or to record the monitored vehicle driving condition.

In one embodiment, referring again to fig. 2, after the original image acquired by the job camera is acquired in step S201, the method may further include: generating a job message according to the original image; the job message is published to enable one or more nodes to obtain the job message.

Optionally, the job message includes each frame of original image and time thereof, and the time of each frame of image may include the acquisition time of the frame of image and/or the time when the computing device receives the frame of image.

Further, the data format of the job message includes two parts, namely, header information for carrying the time of each frame of the original image and a data body for carrying the original image. Alternatively, the data format of the job message is an original image carrying a time stamp including the time of each frame of the original image, the time stamp being annotated for each frame of the original image by the computing device. Further, the data header information may include information such as the size of the data amount of the original image, the length/width of the original image, and the like.

Optionally, after the sensing camera collects the sensing image, the sensing image is transmitted to the computing device, the computing device converts the sensing image into a sensing message, the data format of the sensing message is consistent with the data format of the job message, and the data format of the sensing message can refer to the related description of the data format of the job message. For example, the data format of the perceived message includes two parts, data header information for carrying the time of each frame of perceived image and a data body for carrying the perceived image.

Optionally, after each sensing device collects sensing information (such as acceleration, gravity, etc. of the vehicle) of the automatic driving vehicle, the collected sensing information may also be converted into a data message consistent with the data format of the sensing message.

The computing device converts the acquired frames of raw images into a format of a job message to facilitate subsequent data management. Further, the computing device converts the original image, the perceived image and the perceived information acquired by the operation camera, the perceived camera and the various perceiving devices into other data in a unified data format (namely, operation information, perceived information and data information), so that the data can be managed based on the same set of data management modes, and data analysis is facilitated.

Further, in this embodiment, the collected data such as the original image, the perceived image, and the perceived information are managed based on the same set of data management methods. In the prior art, the operation camera (such as a commercial monitoring camera) generally has a limited video stream coding output format (such as RTSP format), which is inconvenient to participate in joint calculation of other types of data and is inconvenient to realize information synchronization with other data. Therefore, according to the scheme in the embodiment, the data (namely, the original data) collected by the operation camera can be combined with other data to be calculated, for example, the unmanned algorithm is participated. The data collected by the operation camera can be synchronized with other data.

Optionally, the computing device publishes one or more of the job message, the perception message, and the data message, so that each node deployed inside the computing device or each node accessing the unified network with the computing device can obtain the published message.

In a specific embodiment, the autonomous vehicle is configured with a robotic operation (Robot Operating System, ROS for short) system, and the step of issuing the job message may include: the job message is published to the ROS network such that the one or more nodes obtain the job message from the network by subscription.

In particular, ROS is a software architecture with a high degree of flexibility in writing robot software programs. The computing device may publish one or more of a job message, a perception message, and a data message in the ROS network by publishing the topic, and each node may obtain the message published in the ROS network by subscribing to the topic.

It should be noted that, in the embodiment of the present invention, when the automatic driving vehicle is used to implement other operations, the computing device on the automatic driving vehicle is used to convert the original image collected by the operation camera into the image message format in the ROS to obtain the operation message, and issue the operation message to the ROS network, and depending on the issue/subscription mechanism of the message in the ROS, the needed node may obtain the operation message through the topic subscription mode.

Further, after each node obtains the message issued in the ROS, different data processing may be performed on the data in the obtained message based on different analysis requirements, so as to meet the requirements.

In a specific embodiment, referring again to fig. 2, the computing device side may include an original image acquisition node and a remote monitoring node, where the original image acquisition node is configured to acquire an original image acquired by the job camera, generate a job message according to the original image, and issue a command ROS network. The remote monitoring node subscribes to the corresponding topic to acquire the job message, acquires the original image from the acquired job message (i.e., step S201), and performs steps S202 and S203.

Further, the computer equipment side can also comprise a perception image acquisition node and/or a perception information acquisition node, wherein the perception image acquisition node is used for acquiring a perception image acquired by a perception camera and converting the perception image into a perception message to be issued to the ROS network, and the perception information acquisition node is used for acquiring perception information acquired by each perception equipment and converting the perception information into a data message to be sent to the ROS network.

In one embodiment, after the issuing of the job message, the method may further include: acquiring the job message through a local monitoring node, and generating a job image stream based on the job message; and sending the operation image stream to a playing device for playing through the local monitoring node so as to enable a safety officer on the automatic driving vehicle to realize local monitoring.

The local monitoring node is a node for realizing local monitoring of the automatic driving vehicle, and the node can subscribe corresponding topics through a topic subscription mechanism of the ROS to acquire a job message in the ROS network so as to acquire a corresponding original image. The job image stream is an image stream (hereinafter, MJPEG image stream) in which a computing device acquires a plurality of frames of original images from a job message and generates based on the acquired plurality of frames of original images. The job image stream can be monitored locally when played by the playing device.

Optionally, the playing device may be any device capable of playing the job image stream, such as a display screen, a vehicle recorder, and the like. The playback device side may be installed with an application program for playing back an image stream, such as a web page, various video players, etc.

Optionally, after the local monitoring node obtains the operation message, the operation message is converted into each frame of original image in a joint photographic experts group (Joint Photographic Experts Group, abbreviated as JPEG) format, and then the original image is encapsulated into a Motion JPEG (MJPEG) image stream supported by a webpage for playing, and an on-board safety person can open the webpage deployed on the automatic driving vehicle by using a browser to read and play the picture of the MJPEG image stream so as to meet the monitoring requirement of the local monitoring scene of the vehicle. Each frame in the MJPEG image stream is a complete JPEG image, no video coding calculation is consumed when converting the multi-frame JPEG image into the MJPEG image stream, and the data of the original image in the local monitoring scene is always kept at the local of the vehicle end, no network transmission is needed, no network traffic and consumption of network bandwidth in transmission are needed to be considered, so that the delay is very low, and the playing frame rate is high.

In addition, in this embodiment, the local monitoring node obtains the job message from the ROS network through the topic subscription mechanism, and converts the job message into the job image (i.e., JPEG image), and each frame of job image is directly obtained from the memory and then played. The method does not need to be stored in a disk and read and played through a memory as in the existing vehicle local monitoring scene. Therefore, the scheme of the embodiment can effectively lighten the pressure of the input/output interface of the magnetic disk, solve the problems of incomplete picture screen display and image display and improve the experience of local monitoring of a safety officer.

Optionally, the image stream of the perceived image acquired by the perceived camera may be played in the local monitoring scene, and the description of the local monitoring node acquiring the perceived image and playing the corresponding image stream may refer to the above description related to the generation and playing of the job image stream, which is not repeated herein.

In one embodiment, referring to fig. 3, the method for monitoring an autonomous vehicle may further include the steps of:

step S301, when a take-over event occurs to the automatic driving vehicle, determining the occurrence time of the take-over event;

step S302, obtaining a perceived image within a period of time before the occurrence time;

Step S303, acquiring an original image in a period of time before the occurrence time;

step S304, transmitting the takeover event, the perceived image and the original image within a period of time before the occurrence time to the server.

The take-over event is an event of manual operation on the automatic driving vehicle, and the take-over event can be triggered when a safety person judges that running risk or operation is irregular and the like occur. Optionally, the take-over event may be an emergency braking triggered manually, or an event triggered manually such as left turn, right turn, reverse, etc.

The computing device records a time of occurrence of the time when the take over event on the autonomous vehicle is detected. And collecting the perceived image and the original image for a period of time prior to the time of occurrence, the original image for a period of time may be acquired before/after/simultaneously with the acquisition of the perceived image for a period of time.

For step S304, the computing device sends the obtained perceived image, the original image and the corresponding takeover event to the server, so that the server can determine the cause of the takeover event or judge whether the autopilot algorithm configured on the autopilot vehicle needs improvement in combination with the operation condition of the autopilot vehicle or the vehicle driving condition (which may be obtained by analysis of the original image) and the autopilot condition (which may be obtained by analysis of the perceived image) before the takeover event occurs.

Further, in step S304, when the computing device sends the perceived image and/or the original image to the server, the computing device may perform video stream encoding operation on the perceived image and/or the original image and send the encoded video stream, and the specific description may refer to the related description in the external encoding in the remote monitoring scene, which is not described herein again.

It should be noted that, when the video stream corresponding to the take-over event is sent to the server, the requirement on delay is low, so when the video stream for remote monitoring needs to be sent to the server at the same time, the priority of the video stream corresponding to the take-over event is lower than that of the video stream for remote monitoring.

Optionally, when the perceived image and/or the original image corresponding to the pipe event are video-stream encoded, B-frame encoding may be performed, and/or the GOP size is greater than or equal to the GOP size of the outer encoding. Therefore, the quality of the video stream obtained by encoding can be improved, and more video and image details can be obtained.

Optionally, when the video stream corresponding to the take-over event is sent, the buffer area for sending the video stream is greater than or equal to the buffer area for sending the video stream in the remote monitoring scene, and/or the video stream corresponding to the take-over event is sent through the TCP protocol.

In one embodiment, please refer to fig. 4, fig. 4 is a schematic diagram of another automatic driving vehicle according to an embodiment of the present invention. The original image collected by the operation camera 101 and the perceived image collected by the perceived camera 103 are both JPEG images, the JPEG images are subjected to format conversion to obtain ROS image messages (namely operation messages and perceived messages), and ROS image messages corresponding to multiple frames of images form an ROS image message stream. The ROS image message stream is used for uniformly releasing the ROS image message to each node, so that the consistency of the image message of each node for acquiring the ROS image message under different all use scenes is ensured. The ROS image message comprises a head information part and a data body part, and the head information in the ROS image message carries a time field which can enable the image message and other messages (such as data messages and the like) in the ROS to realize time synchronization.

The ROS image message stream can be acquired by a sensing module and participate in the unmanned algorithm together with sensing information acquired by other sensing devices.

The ROS image message stream can be obtained by a local monitoring node, the MPEG image stream is generated through format conversion, and the MPEG image stream is played locally on an automatic driving vehicle so as to realize local monitoring.

The ROS image message stream may also be obtained by a remote monitoring node, where the original image in the ROS image message stream is encoded to obtain a video stream (such as the FLV stream described above), and the video stream is pushed to the content delivery network (Content Delivery Network, abbreviated as CDN) 30, and when the server 20 wants to perform video playing (i.e., a broadcast stream), the server 20 obtains and plays the video stream from the CDN 30, so as to implement remote monitoring. Optionally, one or more servers 20 may be included, and each server 20 may obtain and play a video stream from the CDN 30.

The CDN 30 is an intelligent virtual network built on the basis of the existing network, and by means of the edge servers deployed in various places, a user can obtain required content nearby through load balancing, content distribution, scheduling and other functional modules of the central platform, network congestion is reduced, and user access response speed and hit rate are improved. Because the CDN 30 relies on the edge nodes distributed over the whole world to acquire resources nearby, the risk of network jitter can be reduced, and delay is reduced while the stability and fluency of the whole link are improved. Also, CDN 30 may transcode the video stream into different formats to support multiple transport protocols and video playback protocols.

In addition, the ROS image message stream may also be acquired by the offline analysis node and stored for subsequent data analysis. Alternatively, the offline analysis node may package the images in the acquired ROS image message stream into a video format (e.g., MP4 format) for storage.

Optionally, the offline analysis node may also acquire other data (such as the above data message or data of the running log of the unmanned algorithm, vehicle positioning, planned path, vehicle self-state, etc.), and store the data in correspondence with the data in the ROS image message stream (such as MP4 formatted video). Further, the offline analysis node may package these acquired data into ROS file packages (i.e., ROS-BAG files in the figure) for storage. The offline analysis node can write the ROS file packet into the disk so as to retrieve the ROS file packet for corresponding data analysis when needed.

Alternatively, before storing these data, different data may be aligned on the time axis based on the respective times of these data, and the time of the ROS image message may be obtained through its header information.

Referring to fig. 5, an embodiment of the present invention further provides a monitoring device 50 for an automatic driving vehicle, where the automatic driving vehicle is provided with a working camera, the working camera is configured to output a coded image, the coded image is obtained by a coding operation of an acquired original image, the coding operation has a first delay, and the monitoring device 50 for an automatic driving vehicle includes: a job image acquisition module 501, configured to acquire an original image acquired by the job camera; an encoding module 502, configured to perform a video stream encoding operation on the original image, to obtain an encoded video stream, where the video stream encoding operation has a second delay, and the second delay is smaller than the first delay; and the sending module 503 is configured to send the video stream to a server for remote monitoring.

In one embodiment, after the original image acquired by the operation camera is acquired, the monitoring device 50 for an autopilot vehicle may further include: the job message generation module is used for generating a job message according to the original image; and the job message issuing module is used for issuing the job message so that one or more nodes can acquire the job message.

In one embodiment, the autonomous vehicle is configured with a robotic operating system, and the job message publishing module may be further configured to publish the job message into the robotic operating system network such that the one or more nodes obtain the job message from the network by subscribing.

In one embodiment, after the issuing of the job message, the monitoring apparatus 50 of the autonomous vehicle may further include: the job image stream generation module is used for acquiring the job message through the local monitoring node and generating a job image stream based on the job message; and the local playing module is used for sending the operation image stream to playing equipment for playing through the local monitoring node so as to enable a safety officer on the automatic driving vehicle to realize local monitoring.

In one embodiment, the autonomous vehicle is further provided with a perception camera for capturing a perception image, the perception image being involved in an autonomous algorithm.

In one embodiment, the monitoring device 50 of the autonomous vehicle may further include: the triggering takeover module is used for determining the occurrence time of the takeover event when the automatic driving vehicle has the takeover event; the perceived image acquisition module is used for acquiring perceived images within a period of time before the occurrence time; the original image acquisition module is used for acquiring an original image in a period of time before the occurrence time; and the takeover data transmission module is used for transmitting the takeover event, the perceived image and the original image which are in a period of time before the occurrence time to the server.

For more details of the operation principle and the operation manner of the monitoring device 50 for an autonomous vehicle, reference may be made to the above description of the monitoring method for an autonomous vehicle in fig. 1 to 4, and the description thereof will not be repeated here.

Further, the embodiment of the invention also discloses a storage medium, on which a computer program is stored, and the computer program executes the technical scheme of the monitoring method of the automatic driving vehicle in the figures 1 to 4 when being run by a processor.

Further, the embodiment of the invention also discloses a computing device, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the technical scheme of the monitoring method of the automatic driving vehicle in the figures 1 to 4 when running the computer program.

Specifically, in the embodiment of the present invention, the processor may be a central processing unit (central processing unit, abbreviated as CPU), and the processor may also be other general purpose processors, digital signal processors (digital signal processor, abbreviated as DSP), application specific integrated circuits (application specific integrated circuit, abbreviated as ASIC), off-the-shelf programmable gate arrays (field programmable gate array, abbreviated as FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be appreciated that the memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically erasable ROM (electrically EPROM, EEPROM), or a flash memory. The volatile memory may be a random access memory (random access memory, RAM for short) which acts as an external cache. By way of example but not limitation, many forms of random access memory (random access memory, abbreviated as RAM) are available, such as static random access memory (static RAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, abbreviated as DDR SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus random access memory (direct rambus RAM, abbreviated as DR RAM).

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In this context, the character "/" indicates that the front and rear associated objects are an "or" relationship.

The term "plurality" as used in the embodiments herein refers to two or more.

The first, second, etc. descriptions in the embodiments of the present application are only used for illustrating and distinguishing the description objects, and no order division is used, nor does it indicate that the number of the devices in the embodiments of the present application is particularly limited, and no limitation on the embodiments of the present application should be construed.

The "connection" in the embodiments of the present application refers to various connection manners such as direct connection or indirect connection, so as to implement communication between devices, which is not limited in any way in the embodiments of the present application.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention should be assessed accordingly to that of the appended claims.

Claims

1. A method of monitoring an autonomous vehicle, the autonomous vehicle being provided with a work camera and one or more perception cameras, the work camera being configured to output a coded image, the perception cameras being configured to acquire the perceived image, the coded image being obtained from an original image acquired through a coding operation, the coding operation having a first delay, the method comprising:

acquiring an original image acquired by the operation camera and the perceived image;

performing a video stream encoding operation on the original image to obtain an encoded video stream, wherein the video stream encoding operation has a second delay, and the second delay is smaller than the first delay, and the encoding operation comprises the following steps: performing color gamut space conversion on RBG images of each frame to obtain YUV color coded images, converting multi-frame continuous YUV color coded images into a digital video compression format, and packaging the digital video compression format into an adaptive video in a streaming media format;

the video stream is sent to a server for remote monitoring;

generating a job message according to the original image;

converting the perceived image into a perceived message;

Issuing one or more of the job message, the perception message and the data message, wherein the data message comprises perception information of an automatic driving vehicle, and the data formats of the perception message and the data message are consistent with the data format of the job message;

acquiring the job message from a robot operating system ROS network through a topic subscription mechanism by a local monitoring node, and generating a job image stream based on the job message;

and sending the operation image stream to a playing device for playing through the local monitoring node.

2. The method of claim 1, wherein B-frame encoding is cancelled and/or the number of image frames in a single group of pictures is less than a first threshold in the video stream encoding operation.

3. Method according to claim 1, characterized in that the size of the buffer for transmitting the video stream is smaller than a second threshold value and/or that the video stream is transmitted via UDP protocol when the video stream is transmitted.

4. A method according to claim 3, wherein the autonomous vehicle is configured with a robotic operating system, and wherein issuing the job message comprises:

And publishing the job message to the robot operating system network so that the one or more nodes acquire the job message from the network in a subscription mode.

5. The method according to claim 1, wherein the method further comprises:

when the automatic driving vehicle generates a take-over event, determining the occurrence time of the take-over event;

acquiring a perceived image within a period of time before the occurrence time;

acquiring an original image within a period of time before the occurrence time;

and sending the take-over event, the perceived image and the original image which are in a period of time before the occurrence time to the server.

6. A monitoring device for an autonomous vehicle, the autonomous vehicle being provided with a working camera and one or more perception cameras, the working camera being adapted to output a coded image, the perception cameras being adapted to acquire the perceived image, the coded image being obtained from an original image acquired through a coding operation, the coding operation having a first delay, the device comprising:

the operation image acquisition module is used for acquiring an original image acquired by the operation camera and the perceived image;

The encoding module is used for executing video stream encoding operation on the original image to obtain an encoded video stream, the video stream encoding operation has a second delay, and the second delay is smaller than the first delay, wherein the encoding operation comprises the following steps: performing color gamut space conversion on RBG images of each frame to obtain YUV color coded images, converting multi-frame continuous YUV color coded images into a digital video compression format, and packaging the digital video compression format into an adaptive video in a streaming media format;

the sending module is used for sending the video stream to a server for remote monitoring;

the job message generation module is used for generating a job message according to the original image;

for converting the perceived image into a perceived message;

the job message issuing module is used for issuing one or more of the job message, the perception message and the data message, wherein the data message comprises perception information of an automatic driving vehicle, and the data formats of the perception message and the data message are consistent with the data format of the job message;

the operation image stream generation module is used for acquiring the operation message from the ROS network of the robot operation system through a topic subscription mechanism by a local monitoring node and generating an operation image stream based on the operation message;

And the local playing module is used for sending the operation image stream to the playing equipment for playing through the local monitoring node.

7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 5.

8. A computing device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.