WO2019047649A1 - 用于确定无人车的驾驶行为的方法和装置 - Google Patents

用于确定无人车的驾驶行为的方法和装置 Download PDF

Info

Publication number
WO2019047649A1
WO2019047649A1 PCT/CN2018/098982 CN2018098982W WO2019047649A1 WO 2019047649 A1 WO2019047649 A1 WO 2019047649A1 CN 2018098982 W CN2018098982 W CN 2018098982W WO 2019047649 A1 WO2019047649 A1 WO 2019047649A1
Authority
WO
WIPO (PCT)
Prior art keywords
driving behavior
model
sequence
instruction sequence
video frame
Prior art date
Application number
PCT/CN2018/098982
Other languages
English (en)
French (fr)
Inventor
郁浩
闫泳杉
郑超
唐坤
张云飞
姜雨
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Publication of WO2019047649A1 publication Critical patent/WO2019047649A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present application relates to the field of computer technologies, and in particular, to the field of computer network technologies, and in particular, to a method and apparatus for determining driving behavior of an unmanned vehicle.
  • an image sensor is required to capture images around the driverless car to determine the environment in which the driverless car is located.
  • mapping model can input a static picture of a single frame as the content seen by the mapping model, and can output a steering wheel angle (or a reciprocal of the turning radius) of the static picture corresponding to the single frame.
  • the purpose of the present application is to propose an improved method and apparatus for determining the driving behavior of an unmanned vehicle to solve the technical problems mentioned in the background section above.
  • an embodiment of the present application provides a method for determining driving behavior of an unmanned vehicle, the method comprising: acquiring a sequence of video frames with timing information within a predetermined time period; inputting a sequence of video frames to the end
  • the driving behavior model obtains a driving behavior instruction sequence outputted from the end-to-end driving behavior model.
  • the video frame sequence is input to the end-to-end driving behavior model
  • the driving behavior instruction sequence outputted from the end-to-end driving behavior model includes: the driving behavior model is implemented by using the RNN model; or the driving behavior model adopts the CNN model and LSTM model architecture synthesis.
  • the driving behavior model when the driving behavior model is synthesized using the CNN model and the LSTM model architecture, the video frame sequence is input into the end-to-end driving behavior model, and the driving behavior instruction sequence outputted from the end-to-end driving behavior model further includes: The CNN model extracts features of each video frame in the video frame sequence; according to the time series information, the features of each video frame are input into the LSTM model to obtain a context vector output by the LSTM model; and the LSTM model is used to decode the context vector to obtain a driving behavior instruction sequence.
  • the driving behavior model when the driving behavior model is synthesized using the CNN model and the LSTM model architecture, the video frame sequence is input into the end-to-end driving behavior model, and the driving behavior instruction sequence outputted from the end-to-end driving behavior model further includes: The CNN model extracts features of each video frame in the video frame sequence; according to the time series information, the features of each video frame are input into the first LSTM model to obtain a context vector output by the first LSTM model; and the second LSTM model is used to decode the context vector. Get a sequence of driving behavior instructions.
  • the end-to-end driving behavior model is trained based on a sequence of actual driving behaviors acquired by the collecting vehicle and a sequence of video frames acquired by the image sensor of the collecting vehicle.
  • the driving behavior commands include a steering wheel control command, a throttle command, and a brake command.
  • the output of the driving behavior command sequence is a driving behavior instruction to which a constraint has been added.
  • the embodiment of the present application provides an apparatus for determining driving behavior of an unmanned vehicle, the apparatus comprising: an image sequence acquiring unit, configured to acquire a sequence of video frames with timing information within a predetermined time period; The sequence determining unit is configured to input the video frame sequence into the end-to-end driving behavior model, and obtain a driving behavior instruction sequence outputted from the end-to-end driving behavior model.
  • the instruction sequence determining unit comprises: an RNN model unit for implementing a driving behavior model in the instruction sequence determining unit with an RNN model; or an architectural synthesis model unit for determining driving behavior in the instruction sequence determining unit
  • the model is synthesized using the CNN model and the LSTM model architecture.
  • the instruction sequence determining unit when the instruction sequence determining unit includes the architecture synthesis model unit, the instruction sequence determining unit further includes: a feature extraction unit, configured to extract features of each video frame in the video frame sequence based on the CNN model; a vector determining unit, For inputting the characteristics of each video frame into the LSTM model according to the time series information, obtaining a context vector output by the LSTM model; and a vector decoding unit for decoding the context vector by using the LSTM model to obtain a driving behavior instruction sequence.
  • a feature extraction unit configured to extract features of each video frame in the video frame sequence based on the CNN model
  • a vector determining unit For inputting the characteristics of each video frame into the LSTM model according to the time series information, obtaining a context vector output by the LSTM model
  • a vector decoding unit for decoding the context vector by using the LSTM model to obtain a driving behavior instruction sequence.
  • the instruction sequence determining unit when the instruction sequence determining unit includes an architectural synthesis model unit, the instruction sequence determining unit further includes: extracting feature units for extracting features of respective video frames in the sequence of video frames based on the CNN model; determining a vector unit, For inputting a feature of each video frame into the first LSTM model according to the timing information to obtain a context vector output by the first LSTM model; and decoding a vector unit for decoding the context vector by using the second LSTM model to obtain a driving behavior instruction sequence .
  • the end-to-end driving behavior model in the command sequence determining unit is trained based on a sequence of actual driving behaviors acquired by the collecting vehicle and a sequence of video frames acquired by the image sensor of the collecting vehicle.
  • the driving behavior instructions in the command sequence determining unit include a steering wheel control command, a throttle command, and a brake command.
  • the output driving behavior instruction sequence in the instruction sequence determination unit is a driving behavior instruction to which a constraint item has been added.
  • an embodiment of the present application provides an apparatus, including: one or more processors; a storage device, configured to store one or more programs; and when one or more programs are executed by one or more processors, A method of causing one or more processors to implement any of the above for determining the driving behavior of an unmanned vehicle.
  • an embodiment of the present application provides a computer readable storage medium, where a computer program is stored, wherein when the program is executed by a processor, any one of the above is used to determine driving behavior of an unmanned vehicle.
  • the method and apparatus for determining the driving behavior of an unmanned vehicle provided by the embodiment of the present application firstly acquire a video frame sequence with time series information within a predetermined time period; and then input the video frame sequence into an end-to-end driving behavior model.
  • the input video frame sequence enables the end-to-end driving behavior model to capture continuously changing environmental information.
  • the driving behavior command sequence can ensure the continuity of driving behavior and can meet the driving needs for a period of time without high
  • the frequency output instruction greatly saves computing resources, and also makes the method operate normally on devices with weak computing power.
  • FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
  • FIG. 2 is a schematic flow diagram of one embodiment of a method for determining driving behavior of an unmanned vehicle in accordance with the present application
  • FIG. 3 is a schematic flow chart of still another embodiment of a method for determining driving behavior of an unmanned vehicle according to the present application
  • FIG. 4 is a schematic flowchart of an application scenario of a method for determining driving behavior of an unmanned vehicle according to an embodiment of the present application
  • FIG. 5 is an exemplary structural diagram of one embodiment of an apparatus for determining driving behavior of an unmanned vehicle according to the present application
  • FIG. 6 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server of an embodiment of the present application.
  • FIG. 1 shows an exemplary system architecture of an embodiment of a method for determining driving behavior of an unmanned vehicle or a device for determining driving behavior of an unmanned vehicle to which the present application can be applied.
  • system architecture 100 can include terminal devices 101, 102, 103, network 104, and servers 105, 106.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the servers 105, 106.
  • Network 104 may include various types of connections, such as wired, wireless communication links, fiber optic cables, and the like.
  • the user can interact with the servers 105, 106 over the network 104 using the terminal devices 101, 102, 103 to receive or send messages and the like.
  • Various communication client applications such as a web browser application, a search application, an instant communication tool, a mailbox client, a social platform software, and the like, may be installed on the terminal devices 101, 102, and 103.
  • the terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablets, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic The video specialist compresses the standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV) player, laptop portable computer and desktop computer, and the like.
  • MP3 players Motion Picture Experts Group Audio Layer III, dynamic The video specialist compresses the standard audio layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV
  • the servers 105, 106 may be servers that provide various services, such as a back-end web server that provides support for web pages displayed on the terminal devices 101, 102, 103.
  • the background web server may perform processing such as analyzing the received web page request and the like, and feed back the processing result (for example, web page page data) to the terminal device.
  • the method for determining the driving behavior of the unmanned vehicle is generally performed by the terminal device 101, 102, 103 or the servers 105, 106, and accordingly, for determining the driving of the unmanned vehicle.
  • the device of the behavior is generally provided in the terminal device 101, 102, 103 or the servers 105, 106.
  • terminal devices, networks, and servers in Figure 1 is merely illustrative. Depending on the implementation needs, there can be any number of terminal devices, networks, and servers.
  • FIG. 2 there is shown a flow diagram of one embodiment of a method for determining driving behavior of an unmanned vehicle in accordance with the present application.
  • the method 200 for determining the driving behavior of an unmanned vehicle includes:
  • step 210 a sequence of video frames with timing information within a predetermined time period is acquired.
  • an electronic device (such as the terminal or server shown in FIG. 1) that runs a method for determining the driving behavior of an unmanned vehicle can acquire video within a predetermined time period acquired by the image sensor.
  • the video here includes a sequence of video frames formed by video frames, and each video frame is accompanied by timing information.
  • step 220 the sequence of video frames is input to an end-to-end driving behavior model to obtain a driving behavior command sequence outputted from the end-to-end driving behavior model.
  • the end-to-end driving behavior model obtains a model of the unmanned driving behavior instruction sequence based on the predetermined input-based image sequence, and represents a mapping relationship from the collected image sequence to the driving behavior sequence of the unmanned vehicle. It can be constructed by the technician based on the collected historical data, or can be manually set by a technician.
  • the end-to-end driving behavior model can be implemented using Recurrent Neural Networks (RNNs).
  • RNNs can process sequence data, which is called a cyclic neural network, that is, the current output of a sequence is also related to the previous output.
  • the specific form of expression is that the network memorizes the previous information and applies it to the calculation of the current output, that is, the nodes between the hidden layers are no longer connected but connected, and the input of the hidden layer includes not only the output of the input layer. It also includes the output of the hidden layer at the previous moment.
  • RNNs can process sequence data of any length, but in practice, in order to reduce complexity, it is often assumed that the current state is only related to the previous states.
  • RNN can use its internal memory to process input sequences of arbitrary timing.
  • the feedback dynamic system reflects the process dynamics in the calculation process, and has stronger dynamic behavior and computational ability than the feedforward neural network.
  • the end-to-end driving behavior model can be synthesized by the CNN model and the LSTM model.
  • the CNN model refers to the convolutional neural network model
  • the LSTM model refers to the long-term and short-term memory (LSTM) model.
  • the CNN model here is used as a feature extractor, assuming that the feature dimension extracted by the CNN is N (generally this feature is the last fully connected layer of the network). Then for the video frame of the K frame, an N-dimensional feature sequence with a sequence length of K is constructed. This feature sequence is then used as the input to the LSTM, and the resulting LSTM output is still a sequence of length K (the dimension should be the number of action categories). The results of this sequence are then averaged to give the final result.
  • the input image sequence can be processed, thereby improving the continuity of driving behavior.
  • the end-to-end driving behavior model is based on the actual driving behavior of the collected vehicle and the video frame acquired by the sensors provided on the collection vehicle.
  • the captured video frame and the actual driving behavior are based on the actual road segment, that is, the training sample can be more suitable for the actual situation, the accuracy of the prediction result of the end-to-end driving behavior model can be improved.
  • the driving behavior command in this embodiment may be a behavior instruction for driving the vehicle in the prior art or a technology developed in the future, which is not limited in this application.
  • the driving behavior instructions herein may include a lateral control command and a vertical control command.
  • the lateral control command can control the lateral displacement of the vehicle, such as a line, a turn, etc.
  • the longitudinal control command can control the longitudinal displacement of the vehicle, such as forward, stop, retreat, and the like.
  • the driving behavior commands include a steering wheel control command, a throttle command, and a brake command.
  • the steering wheel control command is also a lateral control command
  • the throttle command and the brake command are also vertical control commands.
  • the end-to-end driving behavior model can accurately control the merging, turning, advancing, stopping and retreating of the vehicle by outputting the steering wheel control command, the throttle command and the brake command.
  • a constraint item may be added to the output of the driving behavior model in order to ensure that the output driving behavior command sequence satisfies the requirements of the vehicle traveling.
  • the constraint term here can be set according to the requirements of the vehicle driving that the output driving behavior command sequence needs to satisfy. For example, in order to ensure the continuity and smoothness of the running of the vehicle, the constraint may be set such that the coordinate difference between adjacent lateral coordinates is less than a predetermined threshold, and the coordinate difference between adjacent longitudinal coordinates is less than a preset threshold. , thereby improving the accuracy of the output driving behavior command sequence.
  • the method for determining the driving behavior of an unmanned vehicle because the video frame sequence is used as an input, considers the time series information, enables the neural network to capture continuously changing information, and outputs continuous driving.
  • the sequence of behavior instructions can realize path planning, improve the accuracy and accuracy of the driving behavior of the unmanned vehicle, and also meet the driving needs for a period of time without high-frequency output commands, which greatly saves computing resources and makes the system It can also run normally on devices with weak computing power.
  • FIG. 3 shows a flow of still another embodiment of a method for determining driving behavior of an unmanned vehicle according to the present application.
  • a method 300 for determining driving behavior of an unmanned vehicle includes:
  • step 310 a sequence of video frames with timing information within a predetermined time period is acquired.
  • an electronic device (such as the terminal or server shown in FIG. 1) that runs a method for determining the driving behavior of an unmanned vehicle can acquire video within a predetermined time period acquired by the image sensor.
  • the video here includes a sequence of video frames formed by video frames, and each video frame is accompanied by timing information.
  • step 320 features of each video frame in the sequence of video frames are extracted based on the CNN model.
  • the CNN model refers to a convolutional neural network model.
  • the CNN model here is also a feature extractor, assuming that the feature dimension extracted by CNN is N (generally this feature is the last fully connected layer of the network). Then for the video frame of the K frame, an N-dimensional feature sequence with a sequence length of K is constructed. The higher the dimension extracted here, the more information is contained; the smaller the extracted dimension, the higher the calculation speed.
  • the complexity of the network model is reduced, the number of weights is reduced, and the number of parameters that the neural network needs to be trained is reduced, and the neural network structure is changed. It's simpler and more adaptable.
  • step 330 the features of the respective video frames are input to the LSTM model based on the timing information to obtain a context vector output by the LSTM model.
  • the feature of each video frame can be input into the LSTM model according to the time series information, and the context vector output by the LSTM model is obtained.
  • step 340 the LSTM model is used to decode the context vector to obtain a driving behavior command sequence.
  • the LSTM model used in step 340 and the LSTM model used in step 330 may be the same LSTM model, and the LSTM model may implement two functions, one of which functions as an input feature and outputs a context vector. Another function is to output the function of the driving behavior command sequence for the input context vector.
  • the CNN model and the LSTM model are trained based on the actual driving behavior sequence collected by the collecting vehicle and the video frame sequence acquired by the image sensor of the collecting vehicle.
  • the method 300 for determining the driving behavior of an unmanned vehicle is shown in FIG. 3, which is merely an exemplary description of the present application and does not represent a limitation of the present application.
  • the LSTM model employed in step 340 of FIG. 3 and the LSTM model employed in step 330 may also be two LSTM models, wherein the first LSTM model implements the function of inputting a feature output context vector, and the second LSTM model implements an input context.
  • the vector outputs the function of the driving behavior instruction sequence.
  • the first LSTM model and the second LSTM model here represent only two different LSTM models and do not represent a particular limitation on the LSTM model.
  • the method for determining the driving behavior of an unmanned vehicle can extract the high-dimensional image in the video frame based on the CNN model by synthesizing the end-to-end driving behavior model using the CNN model and the LSTM model architecture. Then, the LSTM model is used to obtain the context vector based on the extracted features. Then, the LSTM is used to obtain the driving behavior instruction sequence based on the context vector. This process reduces the computational dimension, thus improving the calculation speed and greatly saving computing resources.
  • FIG. 4 shows a schematic flow chart of an application scenario of a method for determining driving behavior of an unmanned vehicle according to an embodiment of the present application.
  • the method 400 for determining the driving behavior of an unmanned vehicle of the embodiment of the present application runs in the electronic device 420.
  • the input is a video frame sequence 401 in the past 3 seconds period, a total of 30 frames (one frame is taken every 0.1 seconds); then, for each frame of the 30 frames, the CNN model 402 is used to extract the video frames. Feature; then, according to the timing information, the feature 403 of each video frame is input into the first LSTM model 404 to obtain a context vector 405 output by the LSTM model; and then the second LSTM model 406 is used to decode the context vector 405.
  • the driving behavior command sequence 407 is sequentially outputted into a continuous driving behavior command sequence 407, and the driving behavior command sequence 407 includes a driving command of 0.5 seconds in the future, a total of 25 groups, and the adjacent group instructions are separated by 0.02 seconds. Finally, the continuous driving behavior command sequence 407 is absent.
  • the control module of the vehicle is executed in chronological order.
  • the input enables the neural network to capture continuously changing environmental information, and the output is continuous behavior, and constraints can be added to ensure continuity of driving behavior. And because the output is continuous behavior, it can meet the driving demand for a period of time, without high frequency output instructions, which greatly saves computing resources, and also makes the method operate normally on devices with weak computing power.
  • an embodiment of the present application provides an embodiment of an apparatus for determining driving behavior of an unmanned vehicle
  • the implementation of the apparatus for determining driving behavior of an unmanned vehicle An example corresponds to an embodiment of the method for determining the driving behavior of an unmanned vehicle shown in FIGS. 1 to 4, whereby the method for determining the driving behavior of an unmanned vehicle in FIGS. 1 to 4 above
  • the described operations and features are equally applicable to the apparatus 500 for determining the driving behavior of an unmanned vehicle and the units contained therein, and will not be described herein.
  • the apparatus 500 for determining the driving behavior of an unmanned vehicle includes: an image sequence acquiring unit 510, configured to acquire a sequence of video frames with timing information within a predetermined time period; an instruction sequence determining unit 520, It is used to input the video frame sequence into the end-to-end driving behavior model, and obtain the driving behavior instruction sequence output from the end-to-end driving behavior model.
  • the instruction sequence determining unit includes: an RNN model unit 521, configured to implement the driving behavior model in the instruction sequence determining unit 520 by using an RNN model; or an architectural synthesis model unit 522, The driving behavior model in the instruction sequence determining unit 520 is synthesized using the CNN model and the LSTM model architecture.
  • the instruction sequence determining unit when the instruction sequence determining unit includes the architecture synthesis model unit, the instruction sequence determining unit further includes (not shown in the figure): a feature extraction unit, configured to extract the video based on the CNN model a feature of each video frame in the sequence of frames; a vector determining unit configured to input a feature of each video frame into an LSTM model according to timing information to obtain a context vector output by the LSTM model; and a vector decoding unit for decoding the context vector using the LSTM model , get the driving behavior instruction sequence.
  • a feature extraction unit configured to extract the video based on the CNN model a feature of each video frame in the sequence of frames
  • a vector determining unit configured to input a feature of each video frame into an LSTM model according to timing information to obtain a context vector output by the LSTM model
  • a vector decoding unit for decoding the context vector using the LSTM model , get the driving behavior instruction sequence.
  • the instruction sequence determining unit when the instruction sequence determining unit includes the architecture synthesis model unit, the instruction sequence determining unit further includes: an extraction feature unit, configured to extract, according to the CNN model, each video frame in the video frame sequence. a determining a vector unit, configured to input a feature of each video frame into the first LSTM model according to the timing information to obtain a context vector output by the first LSTM model; and a decoding vector unit for decoding the context vector by using the second LSTM model , get the driving behavior instruction sequence.
  • an extraction feature unit configured to extract, according to the CNN model, each video frame in the video frame sequence.
  • a determining a vector unit configured to input a feature of each video frame into the first LSTM model according to the timing information to obtain a context vector output by the first LSTM model
  • a decoding vector unit for decoding the context vector by using the second LSTM model , get the driving behavior instruction sequence.
  • the end-to-end driving behavior model in the command sequence determining unit is trained based on the actual driving behavior sequence collected by the collecting vehicle and the video frame sequence acquired by the image sensor of the collecting vehicle.
  • the driving behavior commands in the command sequence determining unit include a steering wheel control command, a throttle command, and a brake command.
  • the output driving behavior instruction sequence in the instruction sequence determining unit is a driving behavior instruction to which a constraint item has been added.
  • the application also provides an embodiment of a device comprising: one or more processors; a storage device for storing one or more programs; and one or more programs being executed by one or more processors such that one Or a plurality of processors implementing the method for determining driving behavior of an unmanned vehicle as described in any one of the preceding claims.
  • the present application also provides an embodiment of a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the method for determining driving behavior of an unmanned vehicle as described in any of the above.
  • FIG. 6 a block diagram of a computer system 600 suitable for use in implementing a terminal device or server of an embodiment of the present application is shown.
  • the terminal device shown in FIG. 6 is merely an example, and should not impose any limitation on the function and scope of use of the embodiments of the present application.
  • computer system 600 includes a central processing unit (CPU) 601 that can be loaded into a program in random access memory (RAM) 603 according to a program stored in read only memory (ROM) 602 or from storage portion 608. And perform various appropriate actions and processes.
  • RAM random access memory
  • ROM read only memory
  • RAM random access memory
  • various programs and data required for the operation of the system 600 are also stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also coupled to bus 604.
  • the following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, etc.; an output portion 607 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 608 including a hard disk or the like. And a communication portion 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet.
  • Driver 610 is also coupled to I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage portion 608 as needed.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program carried on a computer readable medium, the computer program comprising program code for executing the method illustrated by the flowchart.
  • the computer program can be downloaded and installed from the network via communication portion 609, and/or installed from removable media 611.
  • the central processing unit (CPU) 601 the above-described functions defined in the method of the present application are performed.
  • the computer readable medium described herein may be a computer readable signal medium or a computer readable storage medium or any combination of the two.
  • the computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer readable storage medium may be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus or device.
  • a computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
  • each block of the flowchart or block diagrams can represent a unit, a program segment, or a portion of code that includes one or more logic for implementing the specified.
  • Functional executable instructions can also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present application may be implemented by software or by hardware.
  • the described unit may also be provided in the processor, for example, as a processor comprising an image sequence acquisition unit and an instruction sequence determination unit.
  • the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • the image sequence acquisition unit may also be described as "acquiring a sequence of video frames with timing information within a predetermined time period".
  • the present application further provides a non-volatile computer storage medium, which may be a non-volatile computer storage medium included in the apparatus described in the foregoing embodiments; It may be a non-volatile computer storage medium that exists alone and is not assembled into the terminal.
  • the non-volatile computer storage medium stores one or more programs, when the one or more programs are executed by a device, causing the device to: acquire a sequence of video frames with timing information within a predetermined time period; The video frame sequence is input into the end-to-end driving behavior model to obtain a driving behavior instruction sequence outputted from the end-to-end driving behavior model.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

一种用于确定无人车的驾驶行为的方法和装置。方法的一具体实施方式包括:获取预定时间段之内附有时序信息的视频帧序列(210);将视频帧序列输入端到端驾驶行为模型,得到从端到端驾驶行为模型输出的驾驶行为指令序列(220)。该实施方式中输入的视频帧序列能够让端到端驾驶行为模型捕捉到连续变化的环境信息,驾驶行为指令序列可以保证驾驶行为的连续性,并且可以满足一段时间内的驾驶需求,不用高频率输出指令,大大节省了计算资源,也使得该方法在计算能力较弱的设备上,可以正常运行。

Description

用于确定无人车的驾驶行为的方法和装置
本专利申请要求于2017年9月5日提交的、申请号为201710790586.X、申请人为百度在线网络技术(北京)有限公司、发明名称为“用于确定无人车的驾驶行为的方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本申请涉及计算机技术领域,具体涉及计算机网络技术领域,尤其涉及用于确定无人车的驾驶行为的方法和装置。
背景技术
在无人驾驶系统中,需要通过图像传感器采集无人驾驶汽车周围的图像,以便确定无人驾驶汽车所处的环境。
目前,在根据图像传感器的输入确定无人车的驾驶行为时,通常采用预先构建的映射模型来实现。这里的映射模型,可以输入单帧的静态图片作为映射模型看到的内容,可以输出对应单帧的静态图片的方向盘角度(或者拐弯半径的倒数)。
然而,目前的根据图像传感器的输入确定无人车的驾驶行为的方案中,由于输入为单帧静态图片,因此无法区分出周围环境的静态内容和动态内容,又由于单帧静态图片无法感知无人车自身的状态,因此无法结合无人车自身的状态进行有效的预测,同时,由于输出为单个的方向盘角度,为对应单帧的静态图片所描述的环境的瞬时应激结果,无法实现路径规划等复杂的动作。
发明内容
本申请的目的在于提出一种改进的用于确定无人车的驾驶行为的方法和装置,来解决以上背景技术部分提到的技术问题。
第一方面,本申请实施例提供了一种用于确定无人车的驾驶行为的方法,方法包括:获取预定时间段之内附有时序信息的视频帧序列;将视频帧序列输入端到端驾驶行为模型,得到从端到端驾驶行为模型输出的驾驶行为指令序列。
在一些实施例中,将视频帧序列输入端到端驾驶行为模型,得到从端到端驾驶行为模型输出的驾驶行为指令序列包括:驾驶行为模型采用RNN模型实现;或驾驶行为模型采用CNN模型和LSTM模型架构合成。
在一些实施例中,当驾驶行为模型采用CNN模型和LSTM模型架构合成时,将视频帧序列输入端到端驾驶行为模型,得到从端到端驾驶行为模型输出的驾驶行为指令序列还包括:基于CNN模型,提取视频帧序列中各个视频帧的特征;根据时序信息,将各个视频帧的特征输入LSTM模型,得到LSTM模型输出的上下文向量;采用LSTM模型,解码上下文向量,得到驾驶行为指令序列。
在一些实施例中,当驾驶行为模型采用CNN模型和LSTM模型架构合成时,将视频帧序列输入端到端驾驶行为模型,得到从端到端驾驶行为模型输出的驾驶行为指令序列还包括:基于CNN模型,提取视频帧序列中各个视频帧的特征;根据时序信息,将各个视频帧的特征输入第一LSTM模型,得到第一LSTM模型输出的上下文向量;采用第二LSTM模型,解码上下文向量,得到驾驶行为指令序列。
在一些实施例中,端到端驾驶行为模型基于采集车采集的实际驾驶行为序列与采集车的图像传感器采集得到的视频帧序列训练得到。
在一些实施例中,驾驶行为指令包括方向盘控制指令、油门指令和刹车指令。
在一些实施例中,输出的驾驶行为指令序列为已添加约束项的驾驶行为指令。
第二方面,本申请实施例提供了一种用于确定无人车的驾驶行为的装置,装置包括:图像序列获取单元,用于获取预定时间段之内附有时序信息的视频帧序列;指令序列确定单元,用于将视频帧序列输入端到端驾驶行为模型,得到从端到端驾驶行为模型输出的驾驶行为 指令序列。
在一些实施例中,指令序列确定单元包括:RNN模型单元,用于将指令序列确定单元中的驾驶行为模型采用RNN模型实现;或架构合成模型单元,用于将指令序列确定单元中的驾驶行为模型采用CNN模型和LSTM模型架构合成。
在一些实施例中,当指令序列确定单元包括架构合成模型单元时,指令序列确定单元还包括:特征提取单元,用于基于CNN模型,提取视频帧序列中各个视频帧的特征;向量确定单元,用于根据时序信息,将各个视频帧的特征输入LSTM模型,得到LSTM模型输出的上下文向量;向量解码单元,用于采用LSTM模型,解码上下文向量,得到驾驶行为指令序列。
在一些实施例中,当指令序列确定单元包括架构合成模型单元时,指令序列确定单元还包括:提取特征单元,用于基于CNN模型,提取视频帧序列中各个视频帧的特征;确定向量单元,用于根据时序信息,将各个视频帧的特征输入第一LSTM模型,得到第一LSTM模型输出的上下文向量;以及解码向量单元,用于采用第二LSTM模型,解码上下文向量,得到驾驶行为指令序列。
在一些实施例中,指令序列确定单元中的端到端驾驶行为模型基于采集车采集的实际驾驶行为序列与采集车的图像传感器采集得到的视频帧序列训练得到。
在一些实施例中,指令序列确定单元中的驾驶行为指令包括方向盘控制指令、油门指令和刹车指令。
在一些实施例中,指令序列确定单元中的输出的驾驶行为指令序列为已添加约束项的驾驶行为指令。
第三方面,本申请实施例提供了一种设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序;当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如上任意一项用于确定无人车的驾驶行为的方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如上任 意一项用于确定无人车的驾驶行为的方法。
本申请实施例提供的用于确定无人车的驾驶行为的方法和装置,首先,获取预定时间段之内附有时序信息的视频帧序列;之后,将视频帧序列输入端到端驾驶行为模型,得到从端到端驾驶行为模型输出的驾驶行为指令序列。这一过程中,输入的视频帧序列能够让端到端驾驶行为模型捕捉到连续变化的环境信息,驾驶行为指令序列可以保证驾驶行为的连续性,并且可以满足一段时间内的驾驶需求,不用高频率输出指令,大大节省了计算资源,也使得该方法在计算能力较弱的设备上,可以正常运行。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请可以应用于其中的示例性系统架构图;
图2是根据本申请的用于确定无人车的驾驶行为的方法的一个实施例的示意性流程图;
图3是根据本申请的用于确定无人车的驾驶行为的方法的又一个实施例的示意性流程图;
图4是根据本申请实施例的用于确定无人车的驾驶行为的方法的一个应用场景的示意性流程图;
图5是根据本申请的用于确定无人车的驾驶行为的装置的一个实施例的示例性结构图;
图6是适于用来实现本申请实施例的终端设备或服务器的计算机系统的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请的用于确定无人车的驾驶行为的方法或用于确定无人车的驾驶行为的装置的实施例的示例性系统架构。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105、106。网络104用以在终端设备101、102、103和服务器105、106之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105、106交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。
服务器105、106可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的网页提供支持的后台网页服务器。后台网页服务器可以对接收到的网页页面请求等数据进行分析等处理,并将处理结果(例如网页页面数据)反馈给终端设备。
需要说明的是,本申请实施例所提供的用于确定无人车的驾驶行为的方法一般由终端设备101、102、103或服务器105、106执行,相应地,用于确定无人车的驾驶行为的装置一般设置于终端设备101、102、103或服务器105、106中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
请参考图2,图2示出了根据本申请的用于确定无人车的驾驶行 为的方法的一个实施例的流程。
如图2所示,该用于确定无人车的驾驶行为的方法200包括:
在步骤210中,获取预定时间段之内附有时序信息的视频帧序列。
在本实施例中,运行用于确定无人车的驾驶行为的方法的电子设备(例如图1中所示的终端或服务器)可以获取图像传感器采集的预定时间段之内的视频。这里的视频包括由视频帧形成的视频帧序列,并且每张视频帧均附有时序信息。
在步骤220中,将所述视频帧序列输入端到端驾驶行为模型,得到从所述端到端驾驶行为模型输出的驾驶行为指令序列。
在本实施例中,端到端驾驶行为模型为预先确定的基于输入的图像序列得到无人车驾驶行为指令序列的模型,表示从采集的图像序列至无人车的驾驶行为序列的映射关系,可以由技术人员基于采集的历史数据构建,也可以由技术人员进行人工设定。
在一个具体地示例中,端到端驾驶行为模型可以采用循环神经网络(Recurrent Neural Networks,RNNs)来实现。RNNs可以处理序列数据,之所以被称为循环神经网络,也即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐藏层之间的节点不再无连接而是有连接的,并且隐藏层的输入不仅包括输入层的输出,还包括上一时刻隐藏层的输出。理论上,RNNs能够对任何长度的序列数据进行处理,但是在实践中,为了降低复杂性,往往假设当前的状态只与前面的几个状态相关。在这里,RNN可以利用它内部的记忆来处理任意时序的输入序列,反馈动力系统在计算过程中体现过程动态特性,与前馈神经网络相比,具有更强的动态行为和计算能力等。
在另一个具体地示例中,端到端驾驶行为模型可以由CNN模型和LSTM模型合成。其中,CNN模型是指卷积神经网络模型,LSTM模型是指长短期记忆(LSTM)模型。这里的CNN模型是作为一个特征提取器,假设CNN提取的特征维度为N(一般这个特征就是网络最后的全连接层)。然后对于K帧的视频帧,就构成了时序长度为K的N维特征序列。然后将这个特征序列作为LSTM的输入,得到的LSTM 的输出依旧是一个长度为K的序列(维度应该是动作类别的数目)。然后将这个序列的结果做平均得到最后的结果。
在这里,通过设置端到端驾驶行为模型由CNN模型和LSTM模型合成,可以处理输入的图片序列,进而提高驾驶行为的连续性。
在本实施例的一些可选实现方式中,端到端驾驶行为模型基于采集车采集的实际驾驶行为与采集车上设置的传感器采集的视频帧训练得到。
在本实现方式中,由于采集的视频帧和实际驾驶行为的均基于实际路段,也即训练样本可以更为贴合实际情况,因此可以提高端到端驾驶行为模型的预测结果的准确性。
本实施例中的驾驶行为指令,可以为现有技术或未来发展的技术中驱动车辆行驶的行为指令,本申请对此不做限定。例如,这里的驾驶行为指令可以包括横向控制指令和纵向控制指令。其中,横向控制指令可以控制车辆的横向位移,例如并线、转弯等;纵向控制指令可以控制车辆的纵向位移,例如前进、停止、后退等。
在本实施例的一些可选实现方式中,驾驶行为指令包括方向盘控制指令、油门指令和刹车指令。
在本实现方式中,方向盘控制指令也即横向控制指令,油门指令和刹车指令也即纵向控制指令。端到端驾驶行为模型通过输出方向盘控制指令、油门指令和刹车指令,可以准确的控制车辆的并线、转弯、前进、停止和后退等。
由于本实施例中端到端驾驶行为模型输出的驾驶行为指令为连续的行为指令,为了确保输出的驾驶行为指令序列满足车辆行驶的要求,可以对于驾驶行为模型的输出添加约束项。
这里的约束项,可以根据输出的驾驶行为指令序列需要满足的车辆行驶的要求来设定。例如,为了保证车辆行驶的连续性和平滑性,约束项可以设定为相邻的横向坐标之间的坐标差值小于预定阈值,以及相邻的纵向坐标之间的坐标差值小于预设阈值,从而提高输出的驾驶行为指令序列的准确性。
本申请的上述实施例提供的用于确定无人车的驾驶行为的方法, 由于采用视频帧序列作为输入,考虑了时序信息,能让神经网络捕捉到连续变化的信息,输出的是连续的驾驶行为指令序列,可以实现路径规划,提高了输出的无人车的驾驶行为的准确度和精度,还可以满足一段时间内的驾驶需求,无需高频率输出指令,大大节省了计算资源,使得该系统在计算能力较弱的设备上,也可以正常运行。
进一步地,请参考图3,图3示出了根据本申请的用于确定无人车的驾驶行为的方法的又一个实施例的流程。
如图3所示,用于确定无人车的驾驶行为的方法300包括:
在步骤310中,获取预定时间段之内附有时序信息的视频帧序列。
在本实施例中,运行用于确定无人车的驾驶行为的方法的电子设备(例如图1中所示的终端或服务器)可以获取图像传感器采集的预定时间段之内的视频。这里的视频包括由视频帧形成的视频帧序列,并且每张视频帧均附有时序信息。
在步骤320中,基于CNN模型,提取视频帧序列中各个视频帧的特征。
在本实施例中,CNN模型是指卷积神经网络模型。这里的CNN模型也即一个特征提取器,假设CNN提取的特征维度为N(一般这个特征就是网络最后的全连接层)。然后对于K帧的视频帧,就构成了时序长度为K的N维特征序列。这里提取的维度越高,则包含的信息量越多;提取的维度越小,则计算的速度越高。
在这里,通过采用类似于生物神经网络的卷积神经网络模型,降低了网络模型的复杂度,减少了权值的数量,从而减少了神经网络需要训练的参数的个数,使神经网络结构变得更简单,适应性更强。
在步骤330中,根据时序信息,将各个视频帧的特征输入LSTM模型,得到LSTM模型输出的上下文向量。
在本实施例中,基于步骤320中得到的特征序列,可以根据时序信息,将各个视频帧的特征输入LSTM模型,得到LSTM模型输出的上下文向量。
在步骤340中,采用LSTM模型,解码上下文向量,得到驾驶行为指令序列。
在本实施例中,步骤340中采用的LSTM模型和步骤330中采用的LSTM模型,可以为同一LSTM模型,该LSTM模型可以实现两个功能,其中一个功能为输入特征并输出上下文向量的功能,另一个功能为输入上下文向量输出驾驶行为指令序列的功能。
在这里,CNN模型和LSTM模型基于采集车采集的实际驾驶行为序列与采集车的图像传感器采集得到的视频帧序列训练得到。
应当理解,图3中示出用于确定无人车的驾驶行为的方法300,仅为本申请的示例性描述,并不代表对本申请的限定。例如,图3中步骤340中采用的LSTM模型和步骤330中采用的LSTM模型,还可以为两个LSTM模型,其中第一LSTM模型实现输入特征输出上下文向量的功能,第二LSTM模型实现输入上下文向量输出驾驶行为指令序列的功能。这里的第一LSTM模型和第二LSTM模型,仅代表两个不同的LSTM模型,并不代表对LSTM模型的特殊限定。
本申请的上述实施例提供的用于确定无人车的驾驶行为的方法,由于采用CNN模型和LSTM模型架构合成端到端驾驶行为模型,可以基于CNN模型将视频帧中的高维度图像提取特征,之后采用LSTM模型基于提取的特征得到上下文向量,之后采用LSTM基于上下文向量得到驾驶行为指令序列,这一过程中降低了计算的维度,因此提高了计算速度并且大大节省了计算资源。
以下结合图4,描述本申请实施例的用于确定无人车的驾驶行为的方法的示例性应用场景。
如图4所示,图4示出了根据本申请实施例的用于确定无人车的驾驶行为的方法的一个应用场景的示意性流程图。
如图4所示,本申请实施例的用于确定无人车的驾驶行为的方法400,运行于电子设备420中。
首先,输入为过去3秒时间段内的视频帧序列401,共30帧图片(每0.1秒采集一帧);之后,对于30帧图片中的每一帧,采用CNN模型402来提取视频帧的特征;之后,根据时序信息,将各个视频帧的特征403输入第一个LSTM模型404,得到LSTM模型输出的一个上下文向量405;之后,再采用第二个LSTM模型406,解码这个上 下文向量405,依次输出成连续的驾驶行为指令序列407,驾驶行为指令序列407包括未来0.5秒的驾驶行为指令,一共25组,相邻组指令之间相隔0.02秒;最后,连续的驾驶行为指令序列407被无人车的控制模块按时间顺序执行。
本申请的上述应用场景中提供的用于确定无人车的驾驶行为的方法,输入能够让神经网络捕捉到连续变化的环境信息,输出为连续行为,可以添加约束项以保证驾驶行为的连续性,并且由于输出为连续行为,可以满足一段时间内的驾驶需求,不用高频率输出指令,大大节省了计算资源,也使得该方法在计算能力较弱的设备上,可以正常运行。
进一步参考图5,作为对上述方法的实现,本申请实施例提供了一种用于确定无人车的驾驶行为的装置的一个实施例,该用于确定无人车的驾驶行为的装置的实施例与图1至图4所示的用于确定无人车的驾驶行为的方法的实施例相对应,由此,上文针对图1至图4中用于确定无人车的驾驶行为的方法描述的操作和特征同样适用于用于确定无人车的驾驶行为的装置500及其中包含的单元,在此不再赘述。
如图5所示,该用于确定无人车的驾驶行为的装置500包括:图像序列获取单元510,用于获取预定时间段之内附有时序信息的视频帧序列;指令序列确定单元520,用于将视频帧序列输入端到端驾驶行为模型,得到从端到端驾驶行为模型输出的驾驶行为指令序列。
在本实施例的一些可选实现方式中,指令序列确定单元包括:RNN模型单元521,用于将指令序列确定单元520中的驾驶行为模型采用RNN模型实现;或架构合成模型单元522,用于将指令序列确定单元520中的驾驶行为模型采用CNN模型和LSTM模型架构合成。
在本实施例的一些可选实现方式中,当指令序列确定单元包括架构合成模型单元时,指令序列确定单元还包括(图中未示出):特征提取单元,用于基于CNN模型,提取视频帧序列中各个视频帧的特征;向量确定单元,用于根据时序信息,将各个视频帧的特征输入LSTM模型,得到LSTM模型输出的上下文向量;向量解码单元,用于采用LSTM模型,解码上下文向量,得到驾驶行为指令序列。
在本实施例的一些可选实现方式中,当指令序列确定单元包括架构合成模型单元时,指令序列确定单元还包括:提取特征单元,用于基于CNN模型,提取视频帧序列中各个视频帧的特征;确定向量单元,用于根据时序信息,将各个视频帧的特征输入第一LSTM模型,得到第一LSTM模型输出的上下文向量;以及解码向量单元,用于采用第二LSTM模型,解码上下文向量,得到驾驶行为指令序列。
在本实施例的一些可选实现方式中,指令序列确定单元中的端到端驾驶行为模型基于采集车采集的实际驾驶行为序列与采集车的图像传感器采集得到的视频帧序列训练得到。
在本实施例的一些可选实现方式中,指令序列确定单元中的驾驶行为指令包括方向盘控制指令、油门指令和刹车指令。
在本实施例的一些可选实现方式中,指令序列确定单元中的输出的驾驶行为指令序列为已添加约束项的驾驶行为指令。
本申请还提供了一种设备的实施例,包括:一个或多个处理器;存储装置,用于存储一个或多个程序;当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如上任意一项所述的用于确定无人车的驾驶行为的方法。
本申请还提供了一种计算机可读存储介质的实施例,其上存储有计算机程序,该程序被处理器执行时实现如上任意一项所述的用于确定无人车的驾驶行为的方法。
下面参考图6,其示出了适于用来实现本申请实施例的终端设备或服务器的计算机系统600的结构示意图。图6示出的终端设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图6所示,计算机系统600包括中央处理单元(CPU)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有系统600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606; 包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,所述计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时,执行本申请的方法中限定的上述功能。
需要说明的是,本申请所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由 指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个单元、程序段、或代码的一部分,所述单元、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括图像序列获取单元和指令序列确定单元。这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,图像序列获取单元还可以被描述为“获取预定时间段之内附有时序信息的视频帧序列”。
作为另一方面,本申请还提供了一种非易失性计算机存储介质,该非易失性计算机存储介质可以是上述实施例中所述装置中所包含的非易失性计算机存储介质;也可以是单独存在,未装配入终端中的非易失性计算机存储介质。上述非易失性计算机存储介质存储有一个或者多个程序,当所述一个或者多个程序被一个设备执行时,使得所述设备:获取预定时间段之内附有时序信息的视频帧序列;将视频帧序列输入端到端驾驶行为模型,得到从端到端驾驶行为模型输出的驾驶行为指令序列。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限 于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (16)

  1. 一种用于确定无人车的驾驶行为的方法,其特征在于,所述方法包括:
    获取预定时间段之内附有时序信息的视频帧序列;
    将所述视频帧序列输入端到端驾驶行为模型,得到从所述端到端驾驶行为模型输出的驾驶行为指令序列。
  2. 根据权利要求1所述的方法,其特征在于,所述将所述视频帧序列输入端到端驾驶行为模型,得到从所述端到端驾驶行为模型输出的驾驶行为指令序列包括:
    所述驾驶行为模型采用RNN模型实现;或
    所述驾驶行为模型采用CNN模型和LSTM模型架构合成。
  3. 根据权利要求2所述的方法,其特征在于,当所述驾驶行为模型采用CNN模型和LSTM模型架构合成时,所述将所述视频帧序列输入端到端驾驶行为模型,得到从所述端到端驾驶行为模型输出的驾驶行为指令序列还包括:
    基于CNN模型,提取所述视频帧序列中各个视频帧的特征;
    根据所述时序信息,将所述各个视频帧的特征输入LSTM模型,得到所述LSTM模型输出的上下文向量;
    采用所述LSTM模型,解码所述上下文向量,得到驾驶行为指令序列。
  4. 根据权利要求2所述的方法,其特征在于,当所述驾驶行为模型采用CNN模型和LSTM模型架构合成时,所述将所述视频帧序列输入端到端驾驶行为模型,得到从所述端到端驾驶行为模型输出的驾驶行为指令序列还包括:
    基于CNN模型,提取所述视频帧序列中各个视频帧的特征;
    根据所述时序信息,将所述各个视频帧的特征输入第一LSTM模 型,得到所述第一LSTM模型输出的上下文向量;
    采用第二LSTM模型,解码所述上下文向量,得到驾驶行为指令序列。
  5. 根据权利要求1所述的方法,其特征在于,所述端到端驾驶行为模型基于采集车采集的实际驾驶行为序列与采集车的图像传感器采集得到的视频帧序列训练得到。
  6. 根据权利要求1所述的方法,其特征在于,所述驾驶行为指令包括方向盘控制指令、油门指令和刹车指令。
  7. 根据权利要求1所述的方法,其特征在于,所述输出的驾驶行为指令序列为已添加约束项的驾驶行为指令。
  8. 一种用于确定无人车的驾驶行为的装置,其特征在于,所述装置包括:
    图像序列获取单元,用于获取预定时间段之内附有时序信息的视频帧序列;
    指令序列确定单元,用于将所述视频帧序列输入端到端驾驶行为模型,得到从所述端到端驾驶行为模型输出的驾驶行为指令序列。
  9. 根据权利要求8所述的装置,其特征在于,所述指令序列确定单元包括:
    RNN模型单元,用于将所述指令序列确定单元中的所述驾驶行为模型采用RNN模型实现;或
    架构合成模型单元,用于将所述指令序列确定单元中的所述驾驶行为模型采用CNN模型和LSTM模型架构合成。
  10. 根据权利要求9所述的装置,其特征在于,当所述指令序列确定单元包括架构合成模型单元时,所述指令序列确定单元还包括:
    特征提取单元,用于基于CNN模型,提取所述视频帧序列中各个视频帧的特征;
    向量确定单元,用于根据所述时序信息,将所述各个视频帧的特征输入LSTM模型,得到所述LSTM模型输出的上下文向量;以及
    向量解码单元,用于采用所述LSTM模型,解码所述上下文向量,得到驾驶行为指令序列。
  11. 根据权利要求9所述的装置,其特征在于,当所述指令序列确定单元包括架构合成模型单元时,所述指令序列确定单元还包括:
    提取特征单元,用于基于CNN模型,提取所述视频帧序列中各个视频帧的特征;
    确定向量单元,用于根据所述时序信息,将所述各个视频帧的特征输入第一LSTM模型,得到所述第一LSTM模型输出的上下文向量;以及
    解码向量单元,用于采用第二LSTM模型,解码所述上下文向量,得到驾驶行为指令序列。
  12. 根据权利要求8所述的装置,其特征在于,所述指令序列确定单元中的所述端到端驾驶行为模型基于采集车采集的实际驾驶行为序列与采集车的图像传感器采集得到的视频帧序列训练得到。
  13. 根据权利要求8所述的装置,其特征在于,所述指令序列确定单元中的所述驾驶行为指令包括方向盘控制指令、油门指令和刹车指令。
  14. 根据权利要求8所述的装置,其特征在于,所述指令序列确定单元中的所述输出的驾驶行为指令序列为已添加约束项的驾驶行为指令。
  15. 一种设备,其特征在于,包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7中任意一项所述的用于确定无人车的驾驶行为的方法。
  16. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-7中任意一项所述的用于确定无人车的驾驶行为的方法。
PCT/CN2018/098982 2017-09-05 2018-08-06 用于确定无人车的驾驶行为的方法和装置 WO2019047649A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710790586.XA CN107563332A (zh) 2017-09-05 2017-09-05 用于确定无人车的驾驶行为的方法和装置
CN201710790586.X 2017-09-05

Publications (1)

Publication Number Publication Date
WO2019047649A1 true WO2019047649A1 (zh) 2019-03-14

Family

ID=60979280

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/098982 WO2019047649A1 (zh) 2017-09-05 2018-08-06 用于确定无人车的驾驶行为的方法和装置

Country Status (2)

Country Link
CN (1) CN107563332A (zh)
WO (1) WO2019047649A1 (zh)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563332A (zh) * 2017-09-05 2018-01-09 百度在线网络技术(北京)有限公司 用于确定无人车的驾驶行为的方法和装置
CN108470460B (zh) * 2018-04-11 2020-08-28 江苏大学 一种基于智能手机与rnn的周边车辆行为识别方法
US20190354836A1 (en) * 2018-05-17 2019-11-21 International Business Machines Corporation Dynamic discovery of dependencies among time series data using neural networks
CN108764470B (zh) * 2018-05-18 2021-08-31 中国科学院计算技术研究所 一种人工神经网络运算的处理方法
CN108764465B (zh) * 2018-05-18 2021-09-24 中国科学院计算技术研究所 一种进行神经网络运算的处理装置
CN108897313A (zh) * 2018-05-23 2018-11-27 清华大学 一种分层式端到端车辆自动驾驶系统构建方法
CN108710865B (zh) * 2018-05-28 2022-04-22 电子科技大学 一种基于神经网络的司机异常行为检测方法
CN110633718B (zh) * 2018-06-21 2022-06-07 北京京东尚科信息技术有限公司 用于确定环境图像中的行驶区域的方法和装置
CN110633598B (zh) * 2018-06-21 2022-01-07 北京京东尚科信息技术有限公司 用于确定环境图像中的行驶区域的方法和装置
CN110866427A (zh) * 2018-08-28 2020-03-06 杭州海康威视数字技术股份有限公司 一种车辆行为检测方法及装置
CN109711349B (zh) * 2018-12-28 2022-06-28 百度在线网络技术(北京)有限公司 用于生成控制指令的方法和装置
CN109858369A (zh) 2018-12-29 2019-06-07 百度在线网络技术(北京)有限公司 自动驾驶方法和装置
CN110008317A (zh) * 2019-01-23 2019-07-12 艾肯特公司 自然智能的自然表达处理方法、回应方法、设备及系统
CN110019688A (zh) * 2019-01-23 2019-07-16 艾肯特公司 对机器人进行训练的方法
CN109739245A (zh) * 2019-02-19 2019-05-10 东软睿驰汽车技术(沈阳)有限公司 一种基于无人驾驶的端到端模型评估方法及装置
CN111738037B (zh) * 2019-03-25 2024-03-08 广州汽车集团股份有限公司 一种自动驾驶方法及其系统、车辆
CN110188683B (zh) * 2019-05-30 2020-06-16 北京理工大学 一种基于cnn-lstm的自动驾驶控制方法
CN110221611B (zh) * 2019-06-11 2020-09-04 北京三快在线科技有限公司 一种轨迹跟踪控制方法、装置及无人驾驶车辆
CN110488821B (zh) * 2019-08-12 2020-12-29 北京三快在线科技有限公司 一种确定无人车运动策略的方法及装置
CN110533944A (zh) * 2019-08-21 2019-12-03 西安华运天成通讯科技有限公司 基于5g的无人驾驶汽车的通讯方法及其系统
CN110782029B (zh) * 2019-10-25 2022-11-22 阿波罗智能技术(北京)有限公司 神经网络预测方法、装置、电子设备和自动驾驶系统
US11681914B2 (en) 2020-05-08 2023-06-20 International Business Machines Corporation Determining multivariate time series data dependencies
CN113741459B (zh) * 2021-09-03 2024-06-21 阿波罗智能技术(北京)有限公司 确定训练样本的方法和自动驾驶模型的训练方法、装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279759A (zh) * 2013-06-09 2013-09-04 大连理工大学 一种基于卷积神经网络的车辆前方可通行性分析方法
US20170076196A1 (en) * 2015-06-05 2017-03-16 Google Inc. Compressed recurrent neural network models
CN106709461A (zh) * 2016-12-28 2017-05-24 中国科学院深圳先进技术研究院 基于视频的行为识别方法及装置
CN106845411A (zh) * 2017-01-19 2017-06-13 清华大学 一种基于深度学习和概率图模型的视频描述生成方法
CN106873566A (zh) * 2017-03-14 2017-06-20 东北大学 一种基于深度学习的无人驾驶物流车
CN107563332A (zh) * 2017-09-05 2018-01-09 百度在线网络技术(北京)有限公司 用于确定无人车的驾驶行为的方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279759A (zh) * 2013-06-09 2013-09-04 大连理工大学 一种基于卷积神经网络的车辆前方可通行性分析方法
US20170076196A1 (en) * 2015-06-05 2017-03-16 Google Inc. Compressed recurrent neural network models
CN106709461A (zh) * 2016-12-28 2017-05-24 中国科学院深圳先进技术研究院 基于视频的行为识别方法及装置
CN106845411A (zh) * 2017-01-19 2017-06-13 清华大学 一种基于深度学习和概率图模型的视频描述生成方法
CN106873566A (zh) * 2017-03-14 2017-06-20 东北大学 一种基于深度学习的无人驾驶物流车
CN107563332A (zh) * 2017-09-05 2018-01-09 百度在线网络技术(北京)有限公司 用于确定无人车的驾驶行为的方法和装置

Also Published As

Publication number Publication date
CN107563332A (zh) 2018-01-09

Similar Documents

Publication Publication Date Title
WO2019047649A1 (zh) 用于确定无人车的驾驶行为的方法和装置
US11164573B2 (en) Method and apparatus for controlling page
WO2019242222A1 (zh) 用于生成信息的方法和装置
US10490184B2 (en) Voice recognition apparatus and method
WO2019047655A1 (zh) 用于确定无人车的驾驶行为的方法和装置
CN110622176A (zh) 视频分区
EP3893125A1 (en) Method and apparatus for searching video segment, device, medium and computer program product
KR20210001859A (ko) 3차원 가상 인물 입모양 변화 제어 방법 및 장치
WO2020107625A1 (zh) 视频分类方法、装置、电子设备及计算机可读存储介质
US11967150B2 (en) Parallel video processing systems
KR20190140801A (ko) 영상, 음성, 텍스트 정보를 기반으로 사용자의 감정, 나이, 성별을 인식하는 방법
CN112840313A (zh) 电子设备及其控制方法
CN113360683B (zh) 训练跨模态检索模型的方法以及跨模态检索方法和装置
WO2020006962A1 (zh) 用于处理图片的方法和装置
CN114220163B (zh) 人体姿态估计方法、装置、电子设备及存储介质
US20230076196A1 (en) Asynchronous multi-user real-time streaming of web-based image edits using generative adversarial network(s)
CN117033954A (zh) 一种数据处理方法及相关产品
US10910014B2 (en) Method and apparatus for generating video
CN110263743B (zh) 用于识别图像的方法和装置
CN112269942A (zh) 一种推荐对象的方法、装置、系统及电子设备
CN114202728B (zh) 一种视频检测方法、装置、电子设备及介质
CN111259697A (zh) 用于发送信息的方法和装置
US20240232637A9 (en) Method for Training Large Language Models to Perform Query Intent Classification
WO2024128517A1 (en) Machine learning-based approach for audio-driven avatar animation or other functions
US20240135187A1 (en) Method for Training Large Language Models to Perform Query Intent Classification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18853821

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 05.08.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18853821

Country of ref document: EP

Kind code of ref document: A1