CN114786036A

CN114786036A - Monitoring method and device for automatic driving vehicle, storage medium and computer equipment

Info

Publication number: CN114786036A
Application number: CN202210205051.2A
Authority: CN
Inventors: 黄超; 柴桢亮
Original assignee: Shanghai Xiantu Intelligent Technology Co Ltd
Current assignee: Shanghai Xiantu Intelligent Technology Co Ltd
Priority date: 2022-03-02
Filing date: 2022-03-02
Publication date: 2022-07-22
Anticipated expiration: 2042-03-02
Also published as: CN114786036B

Abstract

A monitoring method and device for an automatic driving vehicle, a storage medium and computer equipment are provided, the automatic driving vehicle is provided with a working camera, the working camera is used for outputting a coded image, the coded image is obtained by a coding operation of a collected original image, and the coding operation has a first delay, the method comprises the following steps: acquiring an original image acquired by the operation camera; performing a video stream encoding operation on the original image to obtain an encoded video stream, wherein the video stream encoding operation has a second delay, and the second delay is smaller than the first delay; and sending the video stream to a server for remote monitoring. From this, can effectively reduce the delay of carrying out the control based on the commercial surveillance camera head that sets up on the unmanned vehicle, especially can reduce the influence because delay causes when remote monitoring.

Description

Monitoring method and device for automatic driving vehicle, storage medium and computer equipment

Technical Field

The invention relates to the field of automatic driving, in particular to a monitoring method and device for an automatic driving vehicle, a storage medium and computer equipment.

Background

The unmanned vehicle relies on the sensor perception road conditions and surrounding environment, and the camera is the important visual sensor of unmanned vehicle. The cameras on the unmanned vehicle mainly comprise a perception camera and a commercial monitoring camera (also called an operation camera). The perception camera is mainly used for collecting image data to participate in the unmanned algorithm; the commercial monitoring camera is mainly used for collecting image data to record and monitor the running condition and the operation completion condition of the vehicle.

The image that commercial surveillance camera head gathered output still can be used for remote monitoring usually, and in the prior art, the remote monitoring technical scheme who commonly uses mainly includes following two kinds:

1. acquiring each frame of original image of a commercial monitoring camera, transmitting data of each frame of original image to an intermediate server in a binary network packet through a websocket protocol (a protocol for performing full-duplex communication on a single TCP connection), and encapsulating the data into a file (such as a jpg file) in a picture format in the intermediate server. The server side executing remote monitoring acquires files of original images of frames from the intermediate server in a mode of accessing the image files, and plays continuous original images of frames on the server side in a form similar to a moving picture.

2. The method comprises the steps of obtaining a video stream (which can be in an RTSP format) output by a commercial monitoring camera, transcoding the video stream into a Real-time Transport Protocol (RTMP for short), and pushing the video stream to a video processing center of an intermediate server. And after the server executing remote monitoring requests the intermediate server, the RTMP stream is pulled from the intermediate server for video playing.

However, the two remote monitoring solutions described above have their own drawbacks and disadvantages. The above solution 1 does not perform inter-frame coding on the original image, and needs to transmit the complete original image, which results in greater transmission bandwidth consumption at the same transmission frame rate. If the broadband consumption is large to a certain extent, the bandwidth is blocked, and then the avalanche effect causes large delay. If the pressure of the transmission bandwidth and the transmission delay are reduced, the number of the transmitted images can be reduced, which may cause the frame rate of remote playing to be reduced.

In summary, in the two existing remote monitoring solutions, the delay of remote monitoring is often large, so that the effect of remote monitoring is affected.

Disclosure of Invention

The technical problem solved by the invention is how to effectively reduce the remote monitoring delay of the commercial monitoring camera on the unmanned vehicle.

To solve the above technical problem, an embodiment of the present invention provides a monitoring method for an autonomous vehicle, where the autonomous vehicle is provided with a working camera, the working camera is configured to output a coded image, the coded image is obtained by subjecting a captured original image to a coding operation, and the coding operation has a first delay, and the method includes: acquiring an original image acquired by the operation camera; performing a video stream coding operation on the original image to obtain a coded video stream, wherein the video stream coding operation has a second delay, and the second delay is smaller than the first delay; and sending the video stream to a server for remote monitoring.

Optionally, in the video stream encoding operation, B-frame encoding is cancelled, and/or the number of image frames in a single group of pictures is less than the first threshold.

Optionally, when the video stream is sent, the size of the buffer used for sending the video stream is smaller than a second threshold, and/or the video stream is sent through a UDP protocol.

Optionally, after the acquiring the original image acquired by the working camera, the method further includes: generating an operation message according to the original image; and issuing the job message so that one or more nodes can acquire the job message.

Optionally, the autonomous vehicle is configured with a robotic operation system, and the issuing the work message comprises: publishing the job message to the robotic operating system network to cause the one or more nodes to obtain the job message from the network by way of subscription.

Optionally, after the issuing of the job message, the method further includes: acquiring the operation information through a local monitoring node, and generating an operation image stream based on the operation information; and sending the operation image stream to a playing device through the local monitoring node for playing so as to enable a security officer on the automatic driving vehicle to realize local monitoring.

Optionally, the autonomous driving vehicle is further provided with a perception camera, the perception camera is used for collecting a perception image, and the perception image participates in an autonomous driving algorithm.

Optionally, the method further includes: when a take-over event occurs in the autonomous vehicle, determining the occurrence time of the take-over event; acquiring a perception image in a period of time before the occurrence time; acquiring an original image in a period of time before the occurrence time; and sending the taking-over event, the perception image in a period of time before the occurrence time and the original image to the server.

An embodiment of the present invention further provides a monitoring apparatus for an autonomous vehicle, where the autonomous vehicle is provided with a working camera, the working camera is configured to output a coded image, the coded image is obtained by performing a coding operation on an acquired original image, and the coding operation has a first delay, and the apparatus includes: the operation image acquisition module is used for acquiring an original image acquired by the operation camera; the encoding module is used for executing video stream encoding operation on the original image to obtain an encoded video stream, wherein the video stream encoding operation has a second delay, and the second delay is smaller than the first delay; and the sending module is used for sending the video stream to a server for remote monitoring.

Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of any of the methods for monitoring an autonomous vehicle.

The embodiment of the invention also provides computer equipment which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of any monitoring method of the automatic driving vehicle when executing the computer program.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a monitoring method of an automatic driving vehicle, wherein the automatic driving vehicle is provided with a working camera which is used for outputting a coded image, the coded image is obtained by encoding an acquired original image, the encoding operation has a first delay, and the method comprises the following steps: acquiring an original image acquired by the operation camera; performing a video stream encoding operation on the original image to obtain an encoded video stream, wherein the video stream encoding operation has a second delay, and the second delay is smaller than the first delay; and sending the video stream to a server for remote monitoring. Compared with the prior art, the method of the embodiment is adopted, the automatic driving vehicle obtains the original image of the operation camera without internal coding, carries out external coding on the original image to obtain the video stream, and then pushes the video stream to the server. Therefore, when the automatic driving vehicle sends the monitoring video stream collected by the operation camera to the server, the delay of remote monitoring can be effectively reduced.

Further, the vehicle locally performs interframe coding on the original image after acquiring the original image, packages the coded data into FLV streaming media and then sends the FLV streaming media to the server, so that the original image is compressed. Compared with the prior art, the scheme can reduce the bandwidth consumption during transmission, does not reduce the number of transmitted image frames, and enables the staff at the server side to have better monitoring experience. In addition, the delay caused by interframe coding is less than the delay of internal coding of the operation camera, so that the delay of a remote monitoring scene can be effectively reduced.

Further, the data of the original image, the perception image and the perception information collected by the operation camera, the perception camera and various perception devices are converted into a uniform data format. Therefore, the data can be managed based on the same data management mode, and data analysis is facilitated.

Further, in a vehicle local monitoring scene, the automatic driving vehicle side plays a job image stream (such as an MJPEG image stream) generated by an original image collected by a job camera, and there is no calculation consumption of video coding when the original image is converted into the job image stream, and data of the original image in the local monitoring scene is always kept local at the vehicle end without network transmission, and there is no need to consider network traffic and consumption of network bandwidth in transmission, so that delay is very low, and a play frame rate is high.

Drawings

FIG. 1 is an architecture diagram of a method for monitoring an autonomous vehicle in accordance with an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a method for monitoring an autonomous vehicle in accordance with an embodiment of the present invention;

FIG. 3 is a schematic flow diagram illustrating a portion of a method for monitoring an autonomous vehicle in accordance with an embodiment of the invention;

FIG. 4 is an architecture diagram of a method of monitoring an autonomous vehicle in accordance with an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a monitoring apparatus of an autonomous vehicle according to an embodiment of the present invention.

Detailed Description

As background art, there is a problem that delay is often large when remote monitoring is performed by a commercial monitoring camera in the prior art. The inventor of the present application has noted that the main reason for the above problem is that a commercial surveillance camera generally outputs a Real Time Streaming Protocol (RTSP) video stream after acquiring an original image and performing internal encoding on the original image. However, there is a certain delay in the inner coding operation (the delay is usually about 2 seconds (s)), which may cause the monitoring effect to be affected by the delay when the unmanned vehicle is monitored.

In addition, a safety guard is generally equipped on all unmanned vehicles at present, the safety guard needs to locally watch videos collected by a camera (including a perception camera and an operation camera), artificially plan a predicted track controlled by an automatic driving vehicle and the like so as to know the future driving behaviors of the vehicle, and the safety guard can also judge whether a driving risk exists according to the environment of a vehicle site so as to prepare for intervening in controlling the vehicle in advance.

A security officer watches a scene of a video collected by a camera on vehicle-end equipment, which is called as a vehicle local monitoring scene, and the traditional video playing step in the scene comprises the following steps: and directly acquiring an RTSP video stream output by an operation camera through internal coding, storing the video stream (comprising multi-frame images) into a local disk of the vehicle, and reading the stored video stream from the disk by local playing equipment for playing. Or, the image collected by the perception camera is stored in a local disk of the vehicle, and then the vehicle locally acquires the stored image from the disk to play the image frame by frame.

However, the inventor has found that, in the existing local monitoring scenario of the vehicle, frequent storage and reading of each frame of image of the video stream to the disk are required, which may cause stress on an input/output (I/O) interface of the disk. In addition, if the images are read while the same frame of image is being written, the phenomena of screen blurring, incomplete image display and the like in local monitoring can be caused, and the local monitoring experience of a security guard is influenced.

To solve the problem, an embodiment of the present invention provides a monitoring method for an autonomous vehicle, where the autonomous vehicle is provided with a working camera, the working camera is used for outputting a coded image, the coded image is obtained by subjecting a captured original image to a coding operation, and the coding operation has a first delay, and the method includes: acquiring an original image acquired by the operation camera; performing a video stream coding operation on the original image to obtain a coded video stream, wherein the video stream coding operation has a second delay, and the second delay is smaller than the first delay; and sending the video stream to a server for remote monitoring. By adopting the method of the embodiment, the monitoring delay based on the commercial monitoring camera arranged on the unmanned vehicle can be effectively reduced, and especially, the influence caused by the delay can be reduced during remote monitoring.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. The invention relates to the technical field of internet, in particular to big data analysis and application of an electronic commerce platform.

In one embodiment, referring to fig. 1, fig. 1 is a schematic diagram of a monitoring method for an autonomous vehicle 10, the autonomous vehicle 10 further includes one or more task cameras 101, the task cameras 101 are configured to output coded images, the coded images are obtained by a coding operation performed on captured original images, and the coding operation has a first delay.

Alternatively, the task camera 101 may be mounted outside the autonomous vehicle 10, such as on the roof, the head, the tail, or the sidewall of the autonomous vehicle 10, and is used to capture images of the environment in which the autonomous vehicle 10 is located. For example, the work camera 101 is used to monitor the relative position between the travel route of the autonomous vehicle 10 and the road edge, or to monitor the work condition of the autonomous vehicle 10 or record the monitoring vehicle travel condition.

Alternatively, the work situation is determined according to the work that the autonomous vehicle 10 is set to perform. For example, the operation that the autonomous vehicle 10 is configured to perform is road cleaning, i.e., the autonomous vehicle 10 is an unmanned sweeper, which may refer to an operation in which the unmanned sweeper cleans the road. In this case, the work camera 101 may be installed on the roof or the rear of the vehicle to monitor the road surface cleaned by the unmanned sweeper. Since the unmanned sweeper vehicle is generally required to travel closely to the edge of the road when performing road surface sweeping operations, the operation camera 101 may be disposed on the roof or the rear of the vehicle or on a side wall of the vehicle body for monitoring the relative position between the path of travel of the unmanned sweeper vehicle and the edge of the road to determine whether the vehicle is traveling closely to the edge of the road.

It should be noted that, the work that the autonomous vehicle 10 is configured to perform includes, but is not limited to, road cleaning, and when the autonomous vehicle 10 is configured to perform other work, the position of the work camera 101 and the monitored work condition also change, and the description thereof is omitted here.

Optionally, the operation camera 101 may be implemented by various commercial surveillance cameras (for short, surveillance cameras), and generally, under a conventional use condition, after an original image is acquired by a current common surveillance camera, the original image needs to be encoded to obtain an encoded image, and then the encoded image is output. This encoding may be referred to as inner encoding, which introduces a certain time delay, denoted as first delay. For monitoring scenarios with high instantaneity requirements, the first delay may result in the monitoring requirement not being met.

In one embodiment, a commonly used commercial surveillance camera internally encodes a plurality of original images (for example, the original images may be true color images, that is, RGB images) after acquiring the plurality of original images to obtain an RTSP video stream and output the RTSP video stream. The RTSP video stream includes multiple frames of consecutive encoded images (i.e., encoded images), and the RTSP video stream can satisfy the requirement of the user for watching the monitoring picture. Wherein, RTSP is an application layer Protocol in a Transmission Control Protocol/Internet Protocol (TCP/IP for short) Protocol system.

The autonomous vehicle 10 may also be provided with a computing device 102, the computing device 102 being configured to perform a method of monitoring an autonomous vehicle in accordance with an embodiment of the invention. The computing device 102 may be located on the autonomous vehicle 10, such as integrated into an autonomous system. The computing device 102 may also be another device communicatively coupled with the autonomous vehicle 10.

Referring to fig. 2, fig. 2 is a flow diagram of a method for monitoring an autonomous vehicle, which may be performed by a computing device on the autonomous vehicle. The method may include the following steps S201 to S203, and detailed descriptions of the respective steps are as follows.

Step S201, acquiring an original image acquired by the working camera.

Optionally, the operation camera may be implemented by using an existing monitoring camera, and under such a condition, secondary development may be performed on the monitoring camera to obtain an original image collected by the monitoring camera, where the original image is not subjected to internal coding, and compared with obtaining a coded image, obtaining the original image may avoid first delay. Further, a Software Development Kit (SDK for short) provided by the monitoring camera is used for secondary Development. Alternatively, a dedicated job camera may be customized, which may output a coded image in which the original image is intra-coded, or may obtain an original image that is not intra-coded in a suitable form (e.g., a customized hardware or software interface).

Alternatively, the original image may be an image within the monitoring camera before being subjected to intra-encoding (e.g., RTSP encoding). That is, at the stage of capturing the image of the monitoring camera, the original image data before the intra-encoding is used, thereby avoiding the first delay due to the intra-encoding of the monitoring camera.

Step S202, performing a video stream coding operation on the original image to obtain a coded video stream, where the video stream coding operation has a second delay, and the second delay is smaller than the first delay.

In step S202, the acquired original image is subjected to a video stream encoding operation by the computing device, the encoding may be referred to as outer encoding, which introduces a second delay, the encoding algorithms of the outer encoding and the inner encoding being different, the second delay being smaller than the first delay. Optionally, the outer code has a lower compression ratio than the inner code, so the second delay is smaller than the first delay. In particular, the outer coding is performed by a computing device (e.g., computing device 102 in fig. 1) in the autonomous vehicle, which, on the one hand, may reduce latency by optimizing the coding algorithm; on the other hand, the computing power of computing equipment in an autonomous vehicle is often better than that of a working camera, so that the encoding delay is also favorably reduced; in addition, hardware optimization of the computing devices in the autonomous vehicle may be performed, for example, by using more computationally intensive devices, thereby further reducing coding delays.

And step S203, sending the video stream to a server for remote monitoring.

Specifically, in step S203, the computing device sends the externally encoded video stream to the server, and the staff member on the server side can play the encoded video stream to monitor the work condition of the autonomous vehicle or the vehicle driving condition. The service end monitors the working condition of the automatic driving vehicle or the driving condition of the vehicle, which can be called as a remote monitoring scene. Alternatively, when the staff at the service end finds that there is a working problem or a vehicle running risk, the autonomous vehicle may be remotely controlled by the service end, such as performing emergency braking on the autonomous vehicle, or changing the running route of the autonomous vehicle, and so on.

In addition, after the server receives the video stream sent by the computing device, the server can also monitor the working condition or the vehicle running condition of the automatic driving vehicle by performing image recognition on each frame image in the video stream.

In the method, the computing device on the automatic driving vehicle side acquires the original image of the operation camera without internal coding, carries out external coding on the original image to obtain the video stream, and then pushes the video stream to the server. Therefore, when the automatic driving vehicle sends the monitoring video stream collected by the operation camera to the server, the delay can be effectively reduced, and the reduced delay is the difference between the first delay and the second delay.

Further, in this embodiment, the original image obtained from the remote monitoring scene is not intra-coded, that is, the delay caused by the intra-coding (i.e., the first delay) is not introduced, and the delay caused by the outer-coding (i.e., the second delay) is smaller than the delay caused by the intra-coding (i.e., the first delay) of the monitoring camera. In the above-mentioned solution of the general type 2, a delay caused by inner coding is introduced, so that the method of the embodiment can effectively reduce the delay of the remote monitoring scene.

It should be noted that, in a remote monitoring scenario, if the computing device directly sends the coded image obtained by internally coding the working camera to the server, there is a two-stage delay: the delay caused by the inner coding (i.e., the first delay) and the delay caused by the data transmission between the computing device and the server.

In one embodiment, for step S202 in fig. 2, in order to minimize the delay caused by encoding in the video stream encoding (i.e., outer encoding) operation, one or more of the following manners may be adopted: the Bi-directional interpolated prediction frame (Bi-directional interpolated prediction frame, abbreviated as B-frame) coding is cancelled, and the number of image frames in a single Group of Pictures (GOP) is configured to be less than a first threshold.

The number of image frames included in a single GOP may also be referred to as the size (size) of the GOP. Each GOP contains multiple frames of continuous images, and the first Frame of image of each GOP must be an Intra-Frame (I-Frame for short, also called Key-Frame) to ensure that the GOP does not need to refer to other images, and can be independently encoded and decoded, and the other frames may be B-frames or predicted-frames (P-frames for short).

Further, the I frame picture is completely reserved, and when the I frame is encoded/decoded, only the image data of the frame is needed to complete the encoding/decoding. The P frame represents the difference between the current frame and the previous I frame (or P frame), and the difference defined by the current frame needs to be superimposed on the previously buffered picture during decoding to generate the final picture, i.e., the P frame has no complete picture data, but only data different from the picture of the previous frame. The B frame is data of a difference between a current image frame (referred to as a current frame for short) and a previous frame and a next frame of the current frame, and the B frame is encoded/decoded by acquiring a cache picture of the previous frame of the current frame and decoding a picture of the previous frame of the current frame, and acquiring a final picture by overlapping the pictures of the previous frame of the current frame and the previous frame of the current frame with data of the current frame.

In this embodiment, in a remote monitoring scene, a video stream encoding operation is performed on an original image, and then an encoded video stream is sent to a server. In this encoding operation, the number of image frames in a single GOP (also referred to as GOP size) that is originally set is reduced, and may specifically be set to be smaller than the first threshold. Reducing the GOP size can reduce the number of image frames between adjacent I frames of the video stream, shorten the association between adjacent frames when encoding (or decoding) to reduce the delay of the encoding operation on the computing device side, and reduce the delay of the corresponding decoding operation on the server side. Most preferably, the first threshold is set to 1, that is, each frame in the encoded video stream is an I frame, so that each frame can be independently encoded and decoded, and the delay in the encoding and decoding processes can be reduced to the greatest extent. However, the smaller the GOP size, the higher the proportion of I frames in the video stream, the larger the amount of data to be transmitted, and the higher the requirement for transmission bandwidth. Optionally, the value of the first threshold may be determined based on a transmission bandwidth when the current computing device transmits data to the server and an acceptable delay in a remote monitoring scenario.

In addition, in the encoding operation, the present embodiment cancels the B-frame encoding, whereas in the prior art, the common encoding operation performs the B-frame encoding, the compression rate of the B-frame encoding is high, but the encoding and decoding thereof need to introduce a delay of at least three frames (i.e. a current frame, a previous frame and a next frame) encoding, and the resource occupation is large during the encoding. Therefore, the embodiment cancels B frame coding, and can effectively reduce delay and resource occupation in the coding and decoding processes.

In one embodiment, for step S202 in fig. 2, when the computing device sends the video stream to the server, a buffer area for sending the video stream, which may also be referred to as an output buffer (burst) area or a sending buffer (burst) area, is smaller than the second threshold. In addition, the computing device may send the video stream via a User Datagram Protocol (UDP) Protocol to further reduce latency.

The computing device sends the video stream to the server, and outputs the video stream to the server after the sending buffer is full, so that the sending buffer can be reduced, the buffering duration of the video playing of the server can be reduced, the delay can be reduced, and certain fluency of the video playing of the server needs to be sacrificed at the moment to meet the requirement on monitoring real-time performance in a remote monitoring scene. Reducing the buffer area for transmitting the video stream can reduce the time for each data copy, thereby shortening the delay of transmitting the video stream. Further, the size of the buffer for transmitting the video stream is set to be smaller than the second threshold.

In addition, the UDP protocol has a faster transmission speed and a smaller transmission delay compared to the TCP protocol. Optionally, the embodiment of the present invention may be implemented based on a commonly-used live video protocol developed based on a UDP protocol, for example, a protocol header ARTC used in a low-delay live broadcast solution provided by aristo may be used to implement video streaming between a computing device and a server, and live video at the server.

Generally speaking, when the video streaming transmission between the computing device and the server is realized by using the art c, the sending buffer area is a value preset by a third party (for example, airy cloud).

In a specific embodiment, in the video stream encoding operation, B-frame encoding is cancelled and the number of image frames in a single group of pictures is less than a first threshold; when the video stream is transmitted, the size of a buffer for transmitting the video stream is smaller than a second threshold, and the video stream is transmitted through a UDP protocol. Therefore, the delay is minimized in each step of video encoding, transmission and decoding, so that the low-delay real-time monitoring live broadcast is realized at the server side.

In a particular embodiment, the outer encoding operation may include the steps of: and (3) carrying out color gamut space conversion on each frame of RBG image (the RGB image is a common format of an original image) to obtain a YUV color coded image. Where "Y" represents brightness (Luma) or a gray scale value, and "U" and "V" represent Chroma (Chroma or Chroma) for describing the color and saturation of an image, and for specifying the color of a pixel. And then converting the multi-frame continuous YUV color coded images into a digital Video compression format, and packaging into an adaptive Video (namely, an externally coded Video stream) in a streaming media (FLV for short) format. The digital video compression format may be h.264 or h.265, etc.

In this embodiment, the original image is subjected to format conversion, inter-frame coding, and the like, and then the coded data is encapsulated into a video stream in the FLV format, and in the solution 1 described in the background art, a corresponding scheme is to transmit the data of each frame of the original image to an intermediate server in a binary network packet, and then obtain the data by a server. Since the FLV format video stream sent in this embodiment already inter-frame encodes the original image, it is not necessary to transmit the complete original image, so that bandwidth consumption during transmission can be reduced, the number of transmitted image frames is not reduced, and a service worker can have better monitoring experience.

In one embodiment, referring again to fig. 1, the autonomous vehicle 10 may also be provided with one or more perception cameras 103, the perception cameras 103 being configured to capture perception images that participate in an autonomous driving algorithm, i.e., the autonomous driving algorithm performs autonomous driving related operations based on the perception images and generates control instructions for autonomous driving.

A sensing camera 103 is configured on the autonomous vehicle 10, an original image acquired by the sensing camera 103 may be acquired by a sensing module on the autonomous vehicle 10, and the sensing module runs an autonomous driving algorithm based on the original image acquired by the sensing camera 103 and one or more sensing devices 104 (the sensing devices 104 may include a gravity sensor, an ultrasonic sensor, an accelerometer, etc.) arranged on the autonomous vehicle 10, so as to achieve autonomous driving of the vehicle.

When the autonomous vehicle 10 is used to perform other tasks (e.g., cleaning a road surface, etc.), one or more operation cameras 101 may be installed on the autonomous vehicle, and the original images collected by the operation cameras 101 are not used in the autonomous driving algorithm, but are used to monitor the operation condition of the autonomous vehicle 10 or record the driving condition of the autonomous vehicle.

In an embodiment, referring to fig. 2 again, after acquiring the original image captured by the job camera in step S201, the method may further include: generating an operation message according to the original image; and issuing the job message to enable one or more nodes to acquire the job message.

Optionally, the job message includes each frame of original image and its time, and the time of each frame of image may include the capture time of the frame of image and/or the time when the frame of image is received by the computing device.

Further, the data format of the job message includes two parts, namely data header (header) information and data body, wherein the data header information is used for carrying the time of each frame of original image, and the data body is used for carrying the original image. Or the data format of the job message is an original image carrying a time tag, the time tag comprises the time of each frame of original image, and the time tag is marked for each frame of original image by the computing equipment. Further, the data header information may further include information such as the data size of the original image, the length/width of the original image, and the like.

Optionally, after the sensing camera collects the sensing image, the sensing image is transmitted to the computing device, the computing device converts the sensing image into the sensing message, the data format of the sensing message is consistent with the data format of the job message, and the data format of the sensing message may refer to the relevant description of the data format of the job message. For example, the data format of the perceptual message includes two parts, namely data header information and a data body, wherein the data header information is used for carrying the time of each frame of perceptual image, and the data body is used for carrying the perceptual image.

Optionally, after each sensing device collects sensing information of the autonomous vehicle (such as acceleration, gravity, etc. of the vehicle), the various sensing devices may also convert the collected sensing information into a data message in accordance with the data format of the sensing message.

And the computing equipment converts the acquired original images of the frames into a format of an operation message so as to facilitate subsequent data management. Further, the computing device converts other data, such as original images, perception images and perception information, collected by the operation camera, the perception camera and various perception devices into a unified data format (namely, operation information, perception information and data information), so that the data can be managed based on the same set of data management mode, and data analysis is facilitated.

Further, in this embodiment, the collected data such as the original image, the perception image, and the perception information are managed based on the same set of data management method. However, in the existing solution, the working camera (e.g. a commercial monitoring camera) generally has a limited video stream encoding output format (e.g. RTSP format), and is not convenient to participate in joint calculation of other types of data, nor is it convenient to implement information synchronization with other data. Therefore, the solution in this embodiment enables joint calculation of data (i.e., raw data) collected by the task camera and other data, such as participation in an unmanned algorithm. And the data acquired by the operation camera can be synchronized with other data.

Optionally, the computing device publishes one or more of the job message, the sensing message and the data message, so that each node deployed in the computing device or each node accessing the unified network with the computing device can obtain the published message.

In one embodiment, the autonomous vehicle is configured with a Robot Operating System (ROS) System, and the step of issuing the work message may include: publishing the job message to the ROS network such that the one or more nodes obtain the job message from the network by way of a subscription.

In particular, the ROS is a highly flexible software architecture for writing robot software programs. The computing device can publish one or more of the job message, the sensing message and the data message in the ROS network in a topic publishing mode, and each node can acquire the message published in the ROS network in a topic subscribing mode.

It should be noted that, in the embodiment of the present invention, when the autonomous driving vehicle is used to implement other operations, the computing device on the autonomous driving vehicle is used to convert the original image acquired by the operation camera into the image message format in the ROS to obtain the operation message, and the operation message is also published into the ROS network, and depending on the publishing/subscribing mechanism of the message in the ROS, a node in need may obtain the operation message in a topic subscription manner.

Further, after each node acquires the message issued in the ROS, different data processing can be performed on the data in the acquired message based on different respective analysis requirements to meet the requirements.

In a specific embodiment, referring to fig. 2 again, the computing device side may include an original image acquiring node and a remote monitoring node, where the original image acquiring node is configured to acquire an original image acquired by the job camera, generate a job message according to the original image, and publish the job message in the ROS network. The remote monitoring node subscribes to the corresponding topic to acquire a job message, acquires an original image from the acquired job message (i.e., step S201), and performs step S202 and step S203.

Furthermore, the computer equipment side can also comprise a perception image acquisition node and/or a perception information acquisition node, wherein the perception image acquisition node is used for acquiring a perception image acquired by a perception camera and converting the perception image into a perception message to be issued to the ROS network, and the perception information acquisition node is used for acquiring perception information acquired by each perception equipment and converting the perception information into a data message to be sent to the ROS network.

In one embodiment, after issuing the job message, the issuing may further include: acquiring the job message through a local monitoring node, and generating a job image stream based on the job message; and sending the operation image stream to a playing device through the local monitoring node for playing so as to enable a security officer on the automatic driving vehicle to realize local monitoring.

The local monitoring node is used for realizing the local monitoring of the automatic driving vehicle, subscribes to a corresponding topic through a topic subscription mechanism of the ROS, and acquires an operation message in the ROS network so as to acquire a corresponding original image. The job image stream is an image stream (such as a MJPEG image stream, below) that the computing device acquires a plurality of frames of original images from the job message and generates based on the acquired plurality of frames of original images. And local monitoring can be realized when the job image stream is played by the playing equipment.

Optionally, the playing device may be any device capable of playing the job image stream, such as a display screen, a car recorder, and the like. The playing device side may be installed with an application program for playing an image stream, such as a web page, various video players, and the like.

Optionally, after the local monitoring node obtains the job message, the job message is converted into each frame of original image in Joint Photographic Experts Group (JPEG) format, and then the original image is packaged into a motion JPEG (motion JPEG) image stream supported and played by a webpage, and a car-mounted security officer can open the webpage deployed on the automatic driving vehicle to read and play the frame of the MJPEG image stream, so as to meet the monitoring requirement of the vehicle local monitoring scene. Each frame in the MJPEG image stream is a complete JPEG image, no video coding calculation consumption exists when a plurality of frames of JPEG images are converted into the MJPEG image stream, data of original images in a local monitoring scene are always kept at the vehicle end and do not have network transmission, network flow and consumption of network bandwidth in transmission do not need to be considered, and therefore delay is very low, and the playing frame rate is high.

In addition, in this embodiment, the local monitoring node acquires the job message from the ROS network through a topic subscription mechanism and converts the job message into a job image (i.e., a JPEG image), and each frame of the job image is directly acquired from the memory and then played. The method does not need to store the data into a magnetic disk and then read and play the data through a memory as in the existing vehicle local monitoring scene. Therefore, the scheme of the embodiment can effectively reduce the pressure of the input/output interface of the disk, solve the problems of screen-splash of the picture and incomplete image display, and improve the local monitoring experience of a security worker.

Optionally, an image stream of a sensing image acquired by the sensing camera may also be played in the local monitoring scene, and the description of the local monitoring node acquiring the sensing image and playing the corresponding image stream may refer to the above description about the generation and playing of the job image stream, which is not described herein again.

In one embodiment, referring to fig. 3, the monitoring method of the autonomous vehicle may further include the steps of:

step S301, when a take-over event occurs in the automatic driving vehicle, determining the occurrence time of the take-over event;

step S302, obtaining a perception image in a period of time before the occurrence time;

step S303, acquiring an original image in a period of time before the occurrence time;

step S304, sending the takeover event, the perception image and the original image in a period of time before the occurrence time to the server.

The taking-over event is an event of manual operation on the automatic driving vehicle, and can be triggered by a security officer when the driving risk or the operation is not standard and the like are judged. Optionally, the takeover event may be an emergency brake triggered by a person, or an event triggered by a person, such as a left turn, a right turn, or a reverse.

The computing device records the time of occurrence of the time when the take-over event is detected to occur on the autonomous vehicle. And collecting the perceptual image and the raw image for a period of time before the occurrence time, the raw image for a period of time may be acquired before/after/simultaneously with acquiring the perceptual image for a period of time.

With respect to step S304, the computing device sends the acquired perceptual image to the server together with the original image and the corresponding takeover event, so that the server can determine the cause of the takeover event or judge whether an autopilot algorithm configured on the autopilot vehicle needs to be improved in combination with the work situation or vehicle driving situation (which can be analyzed from the original image) of the autopilot vehicle before the takeover event occurs and the autopilot situation (which can be analyzed from the perceptual image).

Further, in step S304, when the computing device sends the perceptual image and/or the original image to the server, the computing device may perform a video stream coding operation on the perceptual image and/or the original image and send a coded video stream, and specific description may refer to related description in external coding in a remote monitoring scene, which is not described herein again.

It should be noted that, when the video stream corresponding to the takeover event is sent to the server, the requirement on delay is low, so that when the video stream for remote monitoring needs to be sent to the server at the same time, the priority of the video stream corresponding to the takeover event is lower than that of the video stream for remote monitoring.

Optionally, when the video stream coding is performed on the perceptual image and/or the original image corresponding to the management event, B-frame coding may be performed, and/or the GOP size is greater than or equal to the outer-coded GOP size. Therefore, the quality of the video stream obtained by coding can be improved, and more videos and image details can be obtained.

Optionally, when the video stream corresponding to the takeover event is sent, the buffer area for sending the video stream is greater than or equal to the buffer area for sending the video stream in the remote monitoring scene, and/or the video stream corresponding to the takeover event is sent through a TCP protocol.

In one embodiment, referring to fig. 4, fig. 4 is a schematic diagram of another autonomous vehicle according to an embodiment of the invention. The original image acquired by the job camera 101 and the perceived image acquired by the perception camera 103 are both JPEG images, the format of the JPEG images is converted to obtain ROS image messages (i.e. job messages and perception messages), and the ROS image messages corresponding to multiple frames of images form an ROS image message stream. The ROS image message flow surface is used for uniformly releasing the ROS image messages for each node, and the consistency of the image messages of each node for acquiring the ROS image messages in all different using scenes is ensured. The ROS image message comprises two parts of header information and a data body, wherein the header information in the ROS image message carries a time field, and the time field enables the ROS image message and other messages (such as data messages and the like) to be synchronized in time.

The ROS image message flow can be acquired by a sensing module and participates in the unmanned driving algorithm together with the sensing information collected by other sensing equipment.

The ROS image message flow can be obtained by a local monitoring node, and an MPEG image flow is generated through format conversion and is locally played in the automatic driving vehicle so as to realize local monitoring.

The ROS image message stream may also be obtained by a remote monitoring node, and an original image in the ROS image message stream is encoded to obtain a video stream (such as the FLV stream), and the video stream is pushed to a Content Delivery Network (CDN) 30, and when the server 20 needs to perform video playing (i.e., playing stream), the server 20 obtains the video stream from the CDN 30 and plays the video stream, so as to implement remote monitoring. Optionally, one or more servers 20 may be included, and each server 20 may obtain a video stream from the CDN 30 and play the video stream.

The CDN 30 is an intelligent virtual network constructed on the basis of an existing network, and by means of edge servers deployed in various regions and through functional modules of load balancing, content distribution, scheduling, and the like of a central platform, a user can obtain required content nearby, network congestion is reduced, and the access response speed and hit rate of the user are improved. Because the CDN 30 obtains resources by relying on edge nodes that are located all over the world in close proximity, the risk of network jitter can be reduced, and latency is also reduced while improving the stability and smoothness of the full link. Also, CDN 30 may transcode the video stream into a different format to support multiple transport protocols and video playback protocols.

In addition, the ROS image message flow can be acquired by the off-line analysis node and then stored for subsequent data analysis. Alternatively, the offline analysis node may package the images in the captured ROS image message stream into a video format (e.g., MP4 format, etc.) for storage.

Optionally, the offline analysis node may also obtain other data (such as the data message described above or the data of the operation log of the unmanned driving algorithm, vehicle positioning, planned route, vehicle state itself, and the like), and store the data in correspondence with the data (such as the video in the format of MP4 and the like) in the ROS image message stream. Further, the offline analysis node may package the acquired data into an ROS file package (i.e., an ROS-BAG file in the figure) for storage. The offline analysis node can write the ROS file package into a disk so as to retrieve the ROS file package for corresponding data analysis when needed.

Alternatively, before storing the data, the different data may be aligned on a time axis based on their respective times, and the time of the ROS image message may be obtained from its header information.

Referring to fig. 5, an embodiment of the present invention further provides a monitoring apparatus 50 for an autonomous vehicle, where the autonomous vehicle is provided with a working camera, the working camera is used to output a coded image, the coded image is obtained by subjecting a captured original image to a coding operation, and the coding operation has a first delay, and the monitoring apparatus 50 for an autonomous vehicle includes: a job image obtaining module 501, configured to obtain an original image collected by the job camera; an encoding module 502, configured to perform a video stream encoding operation on the original image to obtain an encoded video stream, where the video stream encoding operation has a second delay, and the second delay is smaller than the first delay; a sending module 503, configured to send the video stream to a server for remote monitoring.

Optionally, in the video stream encoding operation, B-frame encoding is cancelled, and/or the number of image frames in a single group of pictures is less than a first threshold.

In one embodiment, after acquiring the raw image captured by the operation camera, the monitoring apparatus 50 for an autonomous driving vehicle may further include: the job message generating module is used for generating a job message according to the original image; and the operation message publishing module is used for publishing the operation message so as to enable one or more nodes to obtain the operation message.

In one embodiment, the autonomous vehicle is configured with a robotic operating system, and the work message publishing module may be further configured to publish the work message into the network of robotic operating systems, such that the one or more nodes obtain the work message from the network by way of subscription.

In one embodiment, after issuing the work message, the monitoring apparatus 50 of the autonomous vehicle may further include: the job image stream generation module is used for acquiring the job message through a local monitoring node and generating a job image stream based on the job message; and the local playing module is used for sending the operation image stream to playing equipment through the local monitoring node for playing so as to enable a security officer on the automatic driving vehicle to realize local monitoring.

In one embodiment, the autonomous vehicle is further provided with a perception camera for collecting a perception image, the perception image participating in an autonomous driving algorithm.

In one embodiment, the monitoring device 50 of the autonomous vehicle may further include: the trigger takeover module is used for determining the occurrence time of the takeover event when the automatic driving vehicle has the takeover event; the perception image acquisition module is used for acquiring a perception image in a period of time before the occurrence time; the original image acquisition module is used for acquiring an original image in a period of time before the occurrence time; and the takeover data sending module is used for sending the takeover event, the perception image and the original image in a period of time before the occurrence time to the server.

For more details of the operation principle and operation mode of the monitoring device 50 for an autonomous vehicle, reference may be made to the above description of the monitoring method for an autonomous vehicle in fig. 1 to 4, and details are not repeated here.

Further, the embodiment of the present invention also discloses a storage medium, on which a computer program is stored, and the computer program is executed by a processor to implement the technical solution of the monitoring method for an autonomous vehicle in fig. 1 to 4.

Further, the embodiment of the present invention also discloses a computing device, which includes a memory and a processor, where the memory stores a computer program capable of running on the processor, and the processor executes the technical solution of the monitoring method for an autonomous vehicle in fig. 1 to 4 when running the computer program.

Specifically, in the embodiment of the present invention, the processor may be a Central Processing Unit (CPU), and the processor may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will also be appreciated that the memory in the embodiments of the subject application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example and not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (enhanced SDRAM), SDRAM (SLDRAM), synchlink DRAM (SLDRAM), and direct bus RAM (DR RAM).

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein indicates that the former and latter associated objects are in an "or" relationship.

The "plurality" appearing in the embodiments of the present application means two or more.

The descriptions of the first, second, etc. appearing in the embodiments of the present application are only for the purpose of illustrating and differentiating the description objects, and do not represent any particular limitation to the number of devices in the embodiments of the present application, and cannot constitute any limitation to the embodiments of the present application.

The term "connection" in the embodiment of the present application refers to various connection manners such as direct connection or indirect connection, so as to implement communication between devices, which is not limited in this embodiment of the present application.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected by one skilled in the art without departing from the spirit and scope of the invention, as defined in the appended claims.

Claims

1. A method for monitoring an autonomous vehicle, wherein the autonomous vehicle is provided with a working camera for outputting a coded image, the coded image being a captured raw image subjected to a coding operation having a first delay, the method comprising: acquiring an original image acquired by the operation camera;

performing a video stream encoding operation on the original image to obtain an encoded video stream, wherein the video stream encoding operation has a second delay, and the second delay is smaller than the first delay;

and sending the video stream to a server for remote monitoring.

2. The method of claim 1, wherein in the video stream coding operation, B-frame coding is cancelled and/or the number of image frames in a single group of pictures is less than a first threshold.

3. The method according to claim 1, wherein the size of the buffer for transmitting the video stream is smaller than the second threshold when transmitting the video stream, and/or wherein the video stream is transmitted via the UDP protocol.

4. The method according to any one of claims 1 to 3, wherein after acquiring the raw image captured by the task camera, further comprising:

generating an operation message according to the original image;

and issuing the job message to enable one or more nodes to acquire the job message.

5. The method of claim 4, wherein the autonomous vehicle is configured with a robotic operating system, the issuing the work message comprising:

publishing the job message to the robot operating system network so that the one or more nodes acquire the job message from the network by means of subscription.

6. The method of claim 5, wherein after said issuing said job message, further comprising:

acquiring the operation information through a local monitoring node, and generating an operation image stream based on the operation information;

and sending the operation image stream to a playing device through the local monitoring node for playing so as to enable a security officer on the automatic driving vehicle to realize local monitoring.

7. The method of any one of claims 1 to 3, wherein the autonomous vehicle is further provided with a perception camera for capturing perception images, the perception images participating in an autonomous driving algorithm.

8. The method of claim 7, further comprising:

determining an occurrence time of a take-over event when the autonomous vehicle has the take-over event;

acquiring a perception image in a period of time before the occurrence time;

acquiring an original image in a period of time before the occurrence time;

and sending the takeover event, the perception image in a period of time before the occurrence time and the original image to the server.

9. A monitoring apparatus for an autonomous vehicle, wherein the autonomous vehicle is provided with a working camera for outputting a coded image, the coded image being a captured raw image obtained by a coding operation having a first delay, the apparatus comprising: the operation image acquisition module is used for acquiring an original image acquired by the operation camera;

the encoding module is used for executing video stream encoding operation on the original image to obtain an encoded video stream, wherein the video stream encoding operation has a second delay, and the second delay is smaller than the first delay; and the sending module is used for sending the video stream to a server for remote monitoring.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.

11. A computing device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 8 when executing the computer program.