CN109685213B

CN109685213B - Method and device for acquiring training sample data and terminal equipment

Info

Publication number: CN109685213B
Application number: CN201811638283.7A
Authority: CN
Inventors: 闫泳杉
Original assignee: Baidu Online Network Technology Beijing Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2022-01-07
Anticipated expiration: 2038-12-29
Also published as: CN109685213A

Abstract

The embodiment of the invention provides a method, a device and a terminal device for acquiring training sample data, wherein the method comprises the following steps: collecting data of a vehicle usage scenario, wherein the collected data comprises at least one of video data and vehicle status data; carrying out data preprocessing on the acquired data to obtain a data file; adding task labels to the data files; and taking the data file added with the task label as training sample data of the driving model. The embodiment of the invention can improve the training efficiency of the driving model.

Description

Method and device for acquiring training sample data and terminal equipment

Technical Field

The invention relates to the technical field of intelligent driving, in particular to a training sample data acquisition method and device and terminal equipment.

Background

With the rapid development of deep learning technology and the intensive research of artificial intelligence, the current trend of driving vehicles from manual driving to automatic driving changes. Among them, the realization of automatic driving through end-to-end deep learning is a main research direction in the field of automatic driving at present. However, the current end-to-end deep learning process mainly trains the driving model by using the collected raw data, which results in low training efficiency of the driving model.

Disclosure of Invention

The embodiment of the invention provides a method and a device for acquiring training sample data and terminal equipment, and aims to solve the problem of low training efficiency of a driving model.

The embodiment of the invention provides a method for acquiring training sample data, which comprises the following steps:

acquiring collected data collected by a vehicle using scene, wherein the collected data comprises at least one of video data and vehicle state data;

carrying out data preprocessing on the acquired data to obtain a data file;

adding task labels to the data files;

and taking the data file added with the task label as training sample data of the driving model.

Optionally, the data preprocessing the collected data to obtain a data file includes:

converting the collected data into a hypertext 5(h 5) format file to obtain the h5 format data file.

Optionally, adding a task label to the data file includes:

adding a target task label to the data file, and classifying the data file added with the target task label into a directory of a target task in a training sample, wherein the data file is used for training the target task.

Optionally, before acquiring the collected data collected in the vehicle usage scenario, the method further includes:

setting an acquisition task of the vehicle use scene;

the acquiring of the collected data collected by the vehicle using scene comprises the following steps:

acquiring the acquired data of the vehicle use scene acquired according to the acquisition task and acquiring the attribute information of the acquired data;

wherein the name of the data file includes the attribute information.

Optionally, after the task of acquiring the vehicle usage scenario is set, the method further includes:

acquiring remark condition information of the acquisition task, and/or acquiring vehicle hardware change information of the acquisition task, wherein the remark condition information is used for describing an environment condition of an acquisition place of the acquisition task, and the vehicle hardware change information comprises: at least one of vehicle information, change time information, a vehicle hardware change item, changed state information, and changed identification information;

and adding the remark condition information and/or the vehicle hardware change information into the training sample data of the driving model.

An embodiment of the present invention further provides an apparatus for acquiring training sample data, including:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring acquired data acquired by a vehicle use scene, and the acquired data comprises at least one of video data and vehicle state data;

the preprocessing module is used for preprocessing the acquired data to obtain a data file;

the marking module is used for adding task marks to the data files;

and the determining module is used for taking the data file added with the task label as training sample data of the driving model.

Optionally, the preprocessing module is configured to convert the collected data into a file in an h5 format, so as to obtain the data file in the h5 format.

Optionally, the labeling module is configured to add a target task label to the data file, and classify the data file with the target task label added thereto into a directory of a target task in a training sample, where the data file is used for training the target task.

Optionally, the apparatus further comprises:

the setting module is used for setting the collection task of the vehicle use scene;

the first acquisition module is used for acquiring the acquired data of the vehicle use scene acquired according to the acquisition task and acquiring the attribute information of the acquired data;

wherein the name of the data file includes the attribute information.

Optionally, the apparatus further comprises:

the device further comprises:

a second obtaining module, configured to obtain remark condition information of the collection task, and/or obtain vehicle hardware change information of the collection task, where the remark condition information is used to describe an environment condition of a collection location of the collection task, and the vehicle hardware change information includes: at least one of vehicle information, change time information, a vehicle hardware change item, changed state information, and changed identification information;

and the adding module is used for adding the remark condition information and/or the vehicle hardware change information in the training sample data of the driving model.

The embodiment of the present invention further provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, and when the computer program is executed by the processor, the steps of the method for acquiring training sample data provided in the embodiment of the present invention are implemented.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for acquiring training sample data provided in the embodiment of the present invention are implemented.

In the embodiment of the invention, the collected data of the vehicle use scene is collected, wherein the collected data comprises at least one of video data and vehicle state data; carrying out data preprocessing on the acquired data to obtain a data file; adding task labels to the data files; and taking the data file added with the task label as training sample data of the driving model. The training sample data is preprocessed and labeled before training, so that the training efficiency of the driving model can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a flowchart of a method for acquiring training sample data according to an embodiment of the present invention;

fig. 2 is a flowchart of another method for acquiring training sample data according to an embodiment of the present invention;

fig. 3 is a schematic diagram of another method for acquiring training sample data according to an embodiment of the present invention;

fig. 4 is a structural diagram of an apparatus for acquiring training sample data according to an embodiment of the present invention;

fig. 5 is a structural diagram of another apparatus for acquiring training sample data according to an embodiment of the present invention;

fig. 6 is a structural diagram of another apparatus for acquiring training sample data according to an embodiment of the present invention;

fig. 7 is a structural diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "comprises," "comprising," or any other variation thereof, in the description and claims of this application, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Furthermore, the use of "and/or" in the specification and claims means that at least one of the connected objects, such as a and/or B, means that three cases, a alone, B alone, and both a and B, exist.

In the embodiments of the present invention, words such as "exemplary" or "for example" are used to mean serving as examples, illustrations or descriptions. Any embodiment or design described as "exemplary" or "e.g.," an embodiment of the present invention is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

Referring to fig. 1, fig. 1 is a flowchart of a method for acquiring training sample data according to an embodiment of the present invention, where the method may be applied to a terminal device, as shown in fig. 1, and includes the following steps:

step 101, acquiring collected data collected by a vehicle using scene, wherein the collected data comprises at least one of video data and vehicle state data.

The terminal device may be a device or a data platform having a data processing function, such as a computer, a server, and a vehicle-mounted device.

The acquiring of the collected data in step 101 may be acquiring video data and/or vehicle state data collected by a vehicle (which may be referred to as a collecting vehicle) in the vehicle usage scene, where the video data may be video data collected by the vehicle through a camera, and the vehicle state data may be vehicle body Controller Area Network (CAN) data, for example: the CAN data of the vehicle body posture CAN be specifically CAN data of a steering angle, a speed, a gear, a steering lamp state, an accelerator state, a brake and the like, which CAN be used for model training.

It should be noted that the video data and the vehicle state data may be collected simultaneously, for example: the method comprises the steps of collecting video data and collecting vehicle state data of each moment in the video.

In addition, the collected video data may be multiple paths of video data, such as: 5-way video data, such as vehicle front (front) video data, vehicle rear (rear) video data, vehicle left (left) video data, vehicle right (right) video data, and Lane keeping assist system (lka) video data, is collected. In the embodiments of the present invention, the present invention is not limited to these embodiments. In addition, in the embodiment of the present invention, each path of video data may be processed separately, or multiple paths of video data may be processed in parallel.

Preferably, in the embodiment of the present invention, the vehicle usage scenario is a Valet Parking (VP) scenario. Of course, this is not limited, for example: other scenarios for autonomous driving through the model are also possible, such as: park or open road autopilot.

And 102, carrying out data preprocessing on the acquired data to obtain a data file.

The data preprocessing may be to perform format conversion on the acquired data, for example: the data file converted into a hypertext (hypertext) format may be specifically image data or CAN data converted into a hypertext (hypertext) format. Of course, the embodiment of the present invention does not limit the data preprocessing to perform format conversion, for example: other data preprocessing that can improve the efficiency of model training is also possible.

It should be noted that step 102 may obtain one or more data files for the collected data, such as: and obtaining a data file of each video and obtaining a data file of vehicle state data.

And 103, adding task labels to the data files.

The task adding label may be a label for adding a task corresponding to the data file, that is, the task label may determine which tasks the data file is used for in training, for example: automatic driving task labeling, parking task labeling or detection task labeling and the like.

When a plurality of data files are obtained in step 102, step 103 may be to add a respective task label to each data file.

And step 104, taking the data file added with the task label as training sample data of the driving model.

In this step, the data file obtained in step 103 is used as training sample data of the driving model, so that the driving model can be trained using the data.

It should be noted that the driving model may be a model that can be used by the vehicle during driving, for example: the steering control model, the speed control model, the brake control model, the gear control model, the steering lamp control model, the detection model and the like are not limited to these.

Since the training sample data is preprocessed and labeled before the step 104, the training efficiency of the driving model can be improved. For example: the data flow (pipeline) needs to be communicated through the whole process from collection to preprocessing to labeling, the collection task is formulated, the available output data is finally obtained, the upstream module is served more efficiently, and the TB-level data can be managed efficiently and completely, so that the terminal equipment can also be called a data flow processing platform.

In the embodiment of the invention, the collected data of the vehicle use scene is collected, wherein the collected data comprises at least one of video data and vehicle state data; carrying out data preprocessing on the video data to obtain a data file; adding task labels to the data files; and taking the data file added with the task label as training sample data of the driving model. The training sample data is preprocessed and labeled before training, so that the training efficiency of the driving model can be improved.

Referring to fig. 2, fig. 2 is a flowchart of another method for acquiring training sample data according to an embodiment of the present invention, where the method may be applied to a terminal device, as shown in fig. 2, and includes the following steps:

step 201, setting a collection task of a vehicle use scene.

The collection task for setting the vehicle usage scenario may be a collection task set according to a location or an area of the vehicle usage scenario. For example: according to the application scene of the VP, a cell or a market containing a parking lot is defined as an acquisition task.

In addition, the set acquisition task may be one or more acquisition tasks. Further, for all acquisition tasks, an acquisition task table may be defined, for example: place-info (place _ id, place, place _ info, driver), where the place _ id represents an identifier of an acquisition task, and each acquisition task corresponds to a unique place _ id, and a place is an acquisition place (or referred to as a task place), for example: parking lot, and place _ info represents special case remarks, such as: the light of the input port of the parking sound is too dark, and the parameters can be used for reference of subsequent field debugging or model training.

Further, the hardware changes are accompanied in the acquisition process, such as: the change of hardware such as an automatic driving chip (for example: px2) or a camera, and the change of hardware directly influences the quality of data and the strategy of using the data in the subsequent model training, so that an equipment change table can be defined, for example: device-modification (car _ id, date, modification, current _ state, modification _ tag), wherein car _ id identifies the collection vehicle, date represents the change date, modification represents the change item, current _ state represents the state after the change, and modification _ tag represents the identification information (for example, number) after the change.

Optionally, before step 201, the method further includes:

The remark condition information is used to describe the environmental condition of the collection location of the collection task, which can be understood as a special condition remark in the collection task table, for example: the condition of the light of the input port of the parking sound, the size of the parking space of the parking sound and the like. The remark condition information is added into the training sample data of the driving model, so that the training sample data is richer, and the training model is more accurate.

The vehicle hardware change information can be information in the equipment change table, and the vehicle hardware change information is added into training sample data, so that the training sample data is richer, and a training model is more accurate. Preferably, the remark information and/or the vehicle hardware change information may be associated with the collected data collected by the collection task, so as to increase the training effect of the collected data.

Step 202, acquiring the acquired data of the vehicle using scene acquired according to the acquisition task, and acquiring attribute information of the acquired data, wherein the acquired data comprises at least one of video data and vehicle state data.

In this step, the number of roads of the video data and the vehicle state data acquired by different acquisition tasks may be the same or different, for example: may be 5 kinds of video data.

The attribute information of the acquired data may be information related to vehicle acquisition, acquisition location, acquisition time, and the like during the acquisition process of the acquired data.

Optionally, the attribute information includes at least one of:

vehicle information, vehicle hardware change items, location information, time information, acquisition mode information, video acquisition orientation information, and type information.

The video capturing direction information may also be understood as road number information of video data capturing, for example: if 5 kinds of video data, namely front (front) video data, rear (rear) video data, left (left) video data, right (right) video data and lka video data, are collected, the video collection orientation information may be front, rear, left, right or lka to indicate that the video data corresponds to a specific orientation of the vehicle.

The type information may describe the type information of the collected data, for example: an attrs identification or an imgs identification, wherein the attrs identification is used to identify vehicle state data, such as: pose can data, while imgs identifies for video data, such as: img image data.

Step 203, performing data preprocessing on the acquired data to obtain a data file, wherein the name of the data file includes the attribute information.

The name of the data file including the attribute information may be obtained by naming the data file according to the attribute information, for example: the file is named as: car-modification-place-date-drive-video-content.h 5. Where car denotes a capture car (i.e., the above-mentioned vehicle information), date denotes time information (e.g., capture time and date), modification denotes a vehicle hardware change item, place denotes a capture location, drviomode denotes a capture mode, video denotes video capture orientation information (e.g., front/rear/left/right/lka), and content denotes type information, for example: attrs identification or imgs identification, wherein attrs identification is used for identifying pose can data and imgs identification is used for img image data. The method specifically comprises the following steps: r1-m 1-parking lot-20180926-r 201-front-attrs. h 5.

It should be noted that the attribute information of the collected data may be attribute information of video data and/or attribute information of vehicle state data, and further, may also be understood as attribute information of the data file, because the data file is converted from the collected data, and of course, if the collected data includes video data and vehicle state data, the video data and the vehicle state data are respectively generated into corresponding data files.

Because the name of the data file comprises the attribute information, the attribute of the data file can be confirmed directly through the name in the training process, and the model training efficiency can be further improved.

converting the collected data into a h5 format file to obtain the h5 format data file.

If the collected data includes video data and vehicle state data, the video data and the vehicle state data are converted into corresponding h5 format files to obtain respective h5 format data files.

In the embodiment, the collected data are uniformly converted into the h5 format file, so that the model training is facilitated.

It should be noted that, in the embodiment of the present invention, the file converted into the h5 format is not limited, for example: it may also be converted to a more version of a hypertext formatted file than h 5.

And step 204, adding task labels to the data files.

Optionally, the adding task labels to the data files includes:

It should be noted that, in practical applications, different data files may be applied to training different tasks, for example: front (front) video data of the vehicle may be used for detection tasks and automatic driving tasks, while rear (rear) video data of the vehicle may be used for parking tasks, while left (left) and right (right) video data of the vehicle may be used for steering tasks, and so on.

In the embodiment, the data files marked by different tasks can be classified into different directories, so that the training of different tasks is performed, and the corresponding data files can be directly read, so that the model training efficiency is further improved. For example: the method comprises the steps of arranging the marking data of each module into a corresponding marking catalog, wherein the modules can comprise an automatic driving task module, a parking task module and a detection task module, for example, arranging the data files marked by the automatic driving task of the automatic driving task module into the automatic driving task marking catalog, arranging the data files marked by the parking task of the parking task module into the parking task marking catalog, arranging the data files marked by the detection task of the detection task module into the detection task marking catalog and the like.

And step 205, taking the data file added with the task label as training sample data of the driving model.

It should be noted that the steps provided by the method may be implemented in a circular manner, for example: as shown in fig. 3, the acquisition is firstly made, and then data acquisition, data preprocessing and data labeling are performed, and if the data of the training model is not enough, data acquisition is performed until the data of the training model is enough, so as to finish the acquisition task.

In this embodiment, various optional embodiments are added to the embodiment shown in fig. 1, and all of them can further improve the training efficiency of the driving model.

Referring to fig. 4, fig. 4 is a structural diagram of an apparatus for acquiring training sample data according to an embodiment of the present invention, and as shown in fig. 4, the apparatus 400 for acquiring training sample data includes:

a first obtaining module 401, configured to obtain collected data collected in a vehicle usage scenario, where the collected data includes at least one of video data and vehicle status data;

a preprocessing module 402, configured to perform data preprocessing on the acquired data to obtain a data file;

a labeling module 403, configured to add task labels to the data files;

and the determining module 404 is configured to use the data file added with the task label as training sample data of the driving model.

Optionally, the preprocessing module 402 is configured to convert the collected data into a file in h5 format, so as to obtain the data file in h5 format.

Optionally, the labeling module 403 is configured to add a target task label to the data file, and classify the data file with the target task label into a directory of a target task in a training sample, where the data file is used for training the target task.

Optionally, as shown in fig. 5, the apparatus further includes:

a setting module 405, configured to set a task of acquiring the vehicle usage scenario;

the first obtaining module 401 is configured to obtain the collected data of the vehicle usage scene collected according to the collection task, and obtain attribute information of the collected data;

wherein the name of the data file includes the attribute information.

Optionally, the attribute information includes at least one of:

Optionally, as shown in fig. 6, the apparatus further includes:

a second obtaining module 406, configured to obtain remark condition information of the acquisition task, and/or obtain vehicle hardware change information of the acquisition task, where the remark condition information is used to describe an environment condition of an acquisition location of the acquisition task, and the vehicle hardware change information includes: at least one of vehicle information, change time information, a vehicle hardware change item, changed state information, and changed identification information;

an adding module 407, configured to add the remark condition information and/or the vehicle hardware change information to the training sample data of the driving model.

The device provided by the embodiment of the present invention can implement each process implemented in the method embodiments of fig. 1 and fig. 2, and can achieve the same beneficial effects, and for avoiding repetition, the details are not described here again.

Referring to fig. 7, fig. 7 is a structural diagram of a terminal device according to an embodiment of the present invention, and as shown in fig. 7, a terminal device 700 includes a processor 701, a memory 702, and a computer program stored in the memory 702 and being executable on the processor.

Wherein the computer program when executed by the processor 701 implements the steps of:

carrying out data preprocessing on the acquired data to obtain a data file;

adding task labels to the data files;

Optionally, the data preprocessing performed on the acquired data by the processor 701 to obtain a data file includes:

converting the collected data into a hypertext 5h5 format file to obtain the h5 format data file.

Optionally, the adding of the task label to the data file by the processor 701 includes:

Optionally, before acquiring the acquired data acquired in the vehicle usage scenario, the processor 701 is further configured to:

setting an acquisition task of the vehicle use scene;

the acquisition of the collected data of the vehicle usage scenario acquisition executed by the processor 701 includes:

wherein the name of the data file includes the attribute information.

Optionally, the attribute information includes at least one of:

Optionally, after the task of acquiring the vehicle usage scenario is set, the processor 701 is further configured to:

The terminal device provided by the embodiment of the present invention can implement each process implemented by the electronic device in the method embodiments of fig. 1 and fig. 2, and can achieve the same beneficial effects, and for avoiding repetition, the details are not repeated here.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for acquiring training sample data is characterized by comprising the following steps:

carrying out data preprocessing on the acquired data to obtain a data file;

adding a target task label to the data file, wherein the target task label is used for determining that the data file is used for training a target task;

taking the data file added with the target task label as training sample data of a driving model;

wherein before acquiring the acquired data acquired by the vehicle use scene, the method further comprises:

setting an acquisition task of the vehicle use scene;

acquiring the acquired data of the vehicle use scene acquired according to the acquisition task and acquiring the attribute information of the acquired data, wherein the name of the data file comprises the attribute information;

after the acquisition task of the vehicle use scene is set, the method further comprises the following steps:

acquiring remark condition information of the acquisition task and vehicle hardware change information of the acquisition task, wherein the remark condition information is used for describing an environment condition of an acquisition place of the acquisition task, and the vehicle hardware change information comprises: at least one of change time information, vehicle hardware change items, changed state information, and changed hardware identification information;

and adding the remark condition information and the vehicle hardware change information into the training sample data of the driving model.

2. The method of claim 1, wherein the data preprocessing the collected data to obtain a data file comprises:

3. The method of claim 1, wherein said adding a target task annotation to said data file comprises:

4. An apparatus for acquiring training sample data, comprising:

the labeling module is used for adding a target task label to the data file, and the target task label is used for determining that the data file is used for training a target task;

the determining module is used for taking the data file added with the target task label as training sample data of a driving model;

wherein the apparatus further comprises:

the first acquisition module is used for acquiring the acquired data of the vehicle use scene acquired according to the acquisition task and acquiring the attribute information of the acquired data, wherein the name of the data file comprises the attribute information;

a second obtaining module, configured to obtain remark condition information of the collection task and vehicle hardware change information of the collection task, where the remark condition information is used to describe an environment condition of a collection location of the collection task, and the vehicle hardware change information includes: at least one of change time information, vehicle hardware change items, changed state information, and changed hardware identification information;

and the adding module is used for adding the remark condition information and the vehicle hardware change information in the training sample data of the driving model.

5. The apparatus of claim 4, wherein the pre-processing module is to convert the collected data into a h5 formatted file to obtain the h5 formatted data file.

6. The apparatus of claim 4, wherein the labeling module is configured to add a target task label to the data file, and classify the data file with the target task label into a directory of target tasks in a training sample, wherein the data file is used for training the target task.

7. A terminal device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the method of acquiring training sample data according to any one of claims 1 to 3.

8. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of acquiring training sample data according to any one of claims 1 to 3.