CN116304834A

CN116304834A - Method and device for generating data to be trained, electronic equipment and storage medium

Info

Publication number: CN116304834A
Application number: CN202310336950.0A
Authority: CN
Inventors: 张霞; 梁斯硕
Original assignee: Chongqing Changan Automobile Co Ltd
Current assignee: Chongqing Changan Automobile Co Ltd
Priority date: 2023-03-31
Filing date: 2023-03-31
Publication date: 2023-06-23

Abstract

According to the method, the device, the electronic equipment and the storage medium for generating the data to be trained, driving increment data and scene application types of a vehicle end are obtained, the driving increment data are uploaded to a cloud end, the driving increment data are classified based on the scene application types, a plurality of scene type data sets are obtained, metadata extraction is carried out to obtain driving scene metadata, and the driving scene metadata is subjected to data format conversion according to a pre-configured data processing node and then is used for generating a number to be trained for a vehicle end driving model based on the driving scene metadata after the format conversion; according to the scheme, the data is uploaded to the cloud, corresponding data operation is carried out, waste of intermediate data to local storage resources can be effectively avoided, metadata extraction of scene data is carried out, overall labeling and control are carried out on the data based on the metadata, the operation efficiency of data integrity control is effectively improved, and generation of data to be trained required by a vehicle end model is facilitated.

Description

Method and device for generating data to be trained, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computers, and in particular, to a method and apparatus for generating data to be trained, an electronic device, and a storage medium.

Background

In the development process of automatic driving, vehicle perception, driving decision and decision execution are necessary conditions for obtaining high-level automatic driving capability, under the current situation that a perception technology and a computing platform are mature and convergent increasingly, automatic driving is scientific for simulating cognitive logic, and on the premise that vehicles are enabled to perceive road conditions and driving environments, an accurate data analysis model is constructed on the premise that cognition is carried out, if data of more new scenes are required to be continuously collected, analyzed and processed for improving cognition accuracy, model feeding and training are carried out, so that the advantages of algorithms are fully exerted, and an automatic driving model is optimized.

However, in the related art, in the process of processing the data volume of automatic driving required by model training, data processing operations such as overall storage, classification, data cleaning and the like are required for the data volume, under the current situation of such data operation, a large amount of storage space and a long processing period are required, and high-efficiency data overall operation cannot be satisfied.

Disclosure of Invention

The embodiment of the invention aims to provide a method, a device, electronic equipment and a storage medium for generating data to be trained, so as to solve the problems that the prior art cannot meet high-efficiency data integrity operation and causes local storage resource waste.

The invention provides a method for generating data to be trained, which comprises the following steps: acquiring driving increment data and scene application types of a vehicle end, and uploading the driving increment data to a cloud; classifying the driving increment data based on the scene application type to obtain a plurality of scene type data sets, wherein the scene type data sets comprise a plurality of driving increment data; extracting metadata from the scene type data sets to obtain driving scene metadata; and carrying out data format conversion on the driving scene metadata according to a pre-configured data processing node, and generating data to be trained for a vehicle-end driving model based on the driving scene metadata after format conversion.

In an embodiment of the present invention, obtaining driving increment data of a vehicle end includes: acquiring initial driving data quantity of a vehicle end, and monitoring the data total quantity of the initial driving data quantity of the vehicle end; if the total data amount is increased, extracting initial incremental data, and determining the structure type of the initial incremental data, wherein the structure type comprises unstructured data and structured parameters; if the incremental data are unstructured data, driving parameter extraction is carried out on the unstructured data according to a preset data storage format, and vehicle-end incremental parameters are obtained; and determining the vehicle-end increment parameters and the structured data as the driving increment data, and storing the driving increment data into a preset increment data storage area.

In an embodiment of the present invention, after determining the vehicle-end increment parameter as the driving increment data, the method for generating data to be trained further includes: acquiring data acquisition time of the driving incremental data, and performing time stamp marking on the driving incremental data according to the data acquisition time to obtain driving incremental data with time stamp marking; generating a storage time sequence according to a preset time interval, and carrying out data alignment on driving increment data with a time stamp according to the time stamp mark in the time sequence so as to store the driving increment data according to the time sequence.

In an embodiment of the present invention, performing data format conversion on the driving scenario metadata according to a pre-configured data processing node includes: determining the data dependence and the data index relation of the driving scene metadata based on the topological association relation of a vehicle-end preset driving module; generating a data dependency label and a data index label according to the data dependency and data index relation, and marking the driving scene metadata according to the data dependency label and the data index label to obtain marked driving scene metadata; and carrying out data format conversion on the marked driving scene metadata according to a preset training data format of the driving model to obtain driving scene metadata after format conversion, and generating data to be trained for the driving model of the vehicle end based on the driving scene metadata after format conversion.

In an embodiment of the present invention, before determining the data dependency and the data index relationship of the driving scene metadata based on the topological association relationship of the driving module preset at the vehicle end, the method for generating the data to be trained further includes: and performing data cleaning processing on the driving scene metadata, wherein the data cleaning processing comprises data deduplication, data fitting and data desensitization so as to perform invalid data elimination on the driving scene metadata.

In an embodiment of the present invention, generating the data to be trained required for the vehicle-end driving model based on the driving scene metadata after the format conversion includes: and dividing the driving scene metadata after format conversion into a training data set and a verification data set according to a preset dividing proportion, and storing the training data set and the verification data set into a preset training data storage area.

In an embodiment of the present invention, after metadata extraction is performed on the plurality of scene type data sets, the method for generating data to be trained further includes: acquiring key characteristics of a data scene of a vehicle-end driving model; and carrying out data retrieval in the extracted initial metadata based on the data scene key features so as to determine driving scene metadata.

The embodiment of the invention also provides a device for generating the data to be trained, which comprises: the vehicle end data acquisition module is used for acquiring driving increment data and scene application types of the vehicle end; the incremental data classification module is used for classifying the driving incremental data based on the scene application type to obtain a plurality of scene type data sets, wherein the scene type data sets comprise a plurality of driving incremental data; the metadata extraction module is used for extracting metadata from the scene type data sets to obtain driving scene metadata; and the metadata processing module is used for carrying out data format conversion on the driving scene metadata according to a preconfigured data processing node and generating data to be trained for a vehicle-end driving model based on the driving scene metadata after format conversion.

The embodiment of the invention also provides electronic equipment, which comprises: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to implement a method of generating data to be trained as in any of the embodiments described above.

The embodiment of the present invention also provides a computer readable storage medium, on which computer readable instructions are stored, which when executed by a processor of a computer, cause the computer to perform the method for generating data to be trained according to any one of the above embodiments.

According to the method, the device, the electronic equipment and the storage medium for generating the data to be trained, driving increment data and scene application types of a vehicle end are obtained, the driving increment data are uploaded to a cloud end, the driving increment data are classified based on the scene application types, a plurality of scene type data sets are obtained, metadata extraction is carried out to obtain driving scene metadata, data format conversion is carried out on the driving scene metadata according to a pre-configured data processing node, and then the number to be trained for a vehicle end driving model is generated based on the driving scene metadata after the format conversion; according to the scheme, the data is uploaded to the cloud, corresponding data operation is carried out, waste of intermediate data to local storage resources can be effectively avoided, metadata extraction of scene data is carried out, overall labeling and control are carried out on the data based on the metadata, the operation efficiency of data integrity control is effectively improved, and generation of data to be trained required by a vehicle end model is facilitated.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:

FIG. 1 is a schematic diagram of an exemplary system architecture shown in an exemplary embodiment of the present application;

FIG. 2 is a flow chart of a method of generating data to be trained, as illustrated in an exemplary embodiment of the present application;

FIG. 3 is a schematic diagram of a driving delta data acquisition process shown in an exemplary embodiment of the present application;

FIG. 4 is a flow chart illustrating a specific method of generating data to be trained according to an exemplary embodiment of the present application;

FIG. 5 is a flow chart illustrating one particular driving delta data acquisition according to an exemplary embodiment of the present application;

FIG. 6 is a schematic diagram of a data to be trained generating apparatus shown in an exemplary embodiment of the present application;

Fig. 7 is a schematic diagram of a computer system of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Further advantages and effects of the present invention will become readily apparent to those skilled in the art from the present disclosure, by referring to the following drawings and specific embodiments. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be understood that the preferred embodiments are presented by way of illustration only and not by way of limitation.

It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the illustrations, not according to the number, shape and size of the components in actual implementation, and the form, number and proportion of each component in actual implementation may be arbitrarily changed, and the layout of the components may be more complex.

In the following description, numerous details are set forth in order to provide a more thorough explanation of embodiments of the present invention, it will be apparent, however, to one skilled in the art that embodiments of the present invention may be practiced without these specific details, in other embodiments, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the embodiments of the present invention.

Reference to "and/or" in this application describing an association relationship of associated objects means that there may be three relationships, e.g., a and/or B may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

It should be noted that, first, metadata (Metadata) refers to data describing data. It is information about the data, such as the content, format, source, creation date, author, owner, modification history, relationship, etc., that can help the user better understand and use the data. Metadata can help data managers sort, organize, index, retrieve, and share data, as well as help data users to quickly locate and use desired data. Metadata may contain various forms of data such as text, images, audio, video, and the like.

Structured data refers to data that may be placed into a table or similar structure. Such data is typically in a well-defined format and fields that can be easily organized, managed, queried, and analyzed. Common structured data includes numbers, dates, times, currencies, prices, names, addresses, etc., which are typically stored in relational databases.

Unstructured data refers to data that cannot be easily placed into a table or similar structure. Such data is typically not in an explicit format, and may be text, images, audio, video, etc., or may contain mixed data in various formats and fields. Unstructured data is generally not manageable, organized, queried, and analyzed, requiring specialized techniques and tools to process. Common unstructured data includes emails, social media posts, news articles, books, blogs, pictures and videos, etc., which are typically stored in document management systems, content management systems, and big data platforms.

The method has the advantages that the initial driving data quantity of the vehicle end is monitored, the data increment is determined, the data increment is extracted according to the data structure, the unified and standardized storage format data content is obtained, further processing operation of the data and the calling and checking of intermediate data are facilitated, and unified storage can be performed; generating a time stamp according to the data acquisition time, marking the data, and storing the aligned data based on a uniform time sequence, so that the data storage can be arranged based on the uniform sequence, and the data of a specific time node can be conveniently extracted and processed; the data dependence and the index relationship are determined based on the topological relationship of the vehicle-end driving module, so that the blood-source continuation relationship of the data can be ensured after the data is subjected to index and dependence label marking, and the embodiment of the topological relationship and the association degree determination of the data structure are facilitated when the data is called; after the metadata is subjected to data deduplication, data fitting and data desensitization, the effective degree of the data can be ensured, the training data can be further processed conveniently, and the accuracy of the data to be trained can be ensured; according to the key characteristics of the data scene, the data is retrieved, the data demand range based on the training model can be reduced, the data quantity to be processed is reduced, and therefore the data processing efficiency is improved.

FIG. 1 is a schematic diagram of an exemplary system architecture shown in an exemplary embodiment of the present application.

Referring to fig. 1, a system architecture may include a vehicle end 101 and a computer device 102. The vehicle end 101 is configured to obtain driving increment data and a scene application type of the vehicle end, upload the driving increment data to the cloud end, and provide the driving increment data to the computer device 102 for processing. The computer device 102 may be at least one of a microcomputer, an embedded computer, a network computer, etc. The related technicians can classify the driving increment data based on the scene application types in the computer equipment 102 to obtain a plurality of scene type data sets, extract metadata from the plurality of scene type data sets to obtain driving scene metadata, perform data format conversion on the driving scene metadata according to the pre-configured data processing nodes, and generate data to be trained for the vehicle-end driving model based on the driving scene metadata after the format conversion.

Illustratively, after obtaining driving increment data and scene application types of the vehicle end 101 and uploading the driving increment data to the cloud end, the computer device 102 classifies the driving increment data based on the scene application types to obtain a plurality of scene type data sets, extracts metadata to obtain driving scene metadata, and generates a number to be trained for a vehicle end driving model based on the driving scene metadata after performing data format conversion on the driving scene metadata according to a pre-configured data processing node; according to the scheme, the data is uploaded to the cloud, corresponding data operation is carried out, waste of intermediate data to local storage resources can be effectively avoided, metadata extraction of scene data is carried out, overall labeling and control are carried out on the data based on the metadata, the operation efficiency of data integrity control is effectively improved, and generation of data to be trained required by a vehicle end model is facilitated.

Fig. 2 is a flowchart illustrating a method of generating data to be trained, which may be performed with a computing processing device, which may be the computer device 102 shown in fig. 1, according to an exemplary embodiment of the present application. Referring to fig. 2, the flowchart of the method for generating data to be trained at least includes steps S210 to S240, which are described in detail as follows:

in step S210, driving increment data and a scene application type of the vehicle end are obtained, and the driving increment data is uploaded to the cloud.

In one embodiment of the present application, fig. 3 is a schematic diagram of a driving incremental data obtaining process shown in an exemplary embodiment of the present application, where the driving incremental data obtaining process may be performed in the driving incremental data obtaining process schematic diagram of fig. 3, and the driving incremental data obtaining process shown in fig. 3 includes at least steps S310 to S340, and is described in detail below:

in step S310, an initial driving data amount of the vehicle end is acquired, and a data total amount of the initial driving data amount of the vehicle end is monitored.

In an embodiment of the present application, the foregoing data total amount process of monitoring the initial driving data amount of the vehicle end may set a monitoring period, so as to monitor the data variation periodically, thereby achieving the purpose of finding the data update time point in time. The above-mentioned setting of the monitoring period can be adjusted and set according to the accuracy requirement of the data monitoring time point in the actual implementation process of the scheme, and the monitoring period is not specifically limited herein.

In step S320, if the total amount of data increases, initial incremental data is extracted, and the structure type of the initial incremental data is determined.

In one embodiment of the present application, the above-described structure types include unstructured data and structured parameters.

In step S330, if the incremental data is unstructured data, driving parameters of the unstructured data are extracted according to a preset data storage format to obtain vehicle-end incremental parameters.

In one embodiment of the present application. If the incremental data is structured data, checking a data storage format of the structured data, and performing data fitting filling on the missing values so as to meet the data storage format consistency requirement and ensure the validity of the data.

In step S340, the vehicle-end increment parameter and the structured data are determined as driving increment data, and stored in a preset increment data storage area.

In an embodiment of the present application, the preset incremental data storage area is set in a local storage area, and the cloud processing is performed after the driving incremental data storage is completed.

According to the driving increment data acquisition method, the initial driving data quantity of the vehicle end is monitored, the data increment is determined, the data increment is extracted according to the data structure, the unified and standardized storage format data content is obtained, further processing operation of the data and the calling and checking of intermediate data are facilitated, and unified storage can be performed.

In one embodiment of the present application, a time stamp (Timestamp) refers to a number, typically an integer or floating point number, representing a point in time. In computer science, a time stamp is typically calculated as the elapsed time in seconds, milliseconds, or microseconds, starting from some fixed point in time. The timestamp may be used to mark the time at which the event occurred, such as the time of sending the email, the time of creation of the file, the time of access to the web page, and so on. The time stamp is widely applied to computer systems and is commonly used for journaling, data backup, data synchronization, data alignment and the like. In an embodiment of the present application, after determining the vehicle-end increment parameter and the structured data as the driving increment data, the method further includes acquiring a data acquisition time of the driving increment data, and performing a timestamp marking on the driving increment data according to the data acquisition time to obtain driving increment data with a timestamp marking.

In one embodiment of the application, a stored time sequence is generated according to a preset time interval, and data alignment is performed on driving increment data with a time stamp in the time sequence according to the time stamp so as to store the driving increment data according to the time sequence.

Generating a time stamp according to the data acquisition time, marking the data, and storing the aligned data based on the uniform time sequence, so that the data storage can be arranged based on the uniform sequence, and the data of a specific time node can be conveniently extracted and processed

In step S220, driving delta data is classified based on scene application types, resulting in a plurality of scene type data sets.

In one embodiment of the present application, the scene type data set includes a plurality of driving delta data.

In one embodiment of the application, after classifying the driving increment data based on the scene application type, a classification label is generated according to the classification type, and the data in each scene type data set is marked according to the data classification label.

In step S230, metadata extraction is performed on the plurality of scene type data sets, resulting in driving scene metadata.

In one embodiment of the present application, after metadata extraction is performed on the plurality of scene type data sets, data scene key features of a vehicle-end driving model are obtained, and data retrieval is performed in the extracted initial metadata based on the data scene key features, so as to determine driving scene metadata.

In one embodiment of the present application, the key features of the data scene include, but are not limited to, classification features of the data, data attribute labeling features, and scene features.

It should be noted that, the above process of performing data retrieval on the initial metadata according to the key features of the data scene may be performed by related technicians, or may be performed by performing model training after labeling the determined key features of the data scene based on a deep learning model, and performing data retrieval and extraction based on the trained model, where a specific data retrieval manner is not limited.

In one embodiment of the present application, a non-relational database is used for managing driving scenario metadata.

According to the key characteristics of the data scene, the data is retrieved, the data demand range based on the training model can be reduced, the data quantity to be processed is reduced, and therefore the data processing efficiency is improved.

In step S240, the driving scenario metadata is subjected to data format conversion according to the preconfigured data processing node, and the data to be trained for the vehicle-end driving model is generated based on the driving scenario metadata after format conversion.

In one embodiment of the application, before the driving scene metadata is subjected to data format conversion, data cleaning processing is performed on the driving scene metadata, wherein the data cleaning processing comprises data deduplication, data fitting and data desensitization so as to perform invalid data rejection on the driving scene metadata. After the metadata is subjected to data deduplication, data fitting and data desensitization, the effectiveness degree of the data can be guaranteed, the training data can be further processed conveniently, and the accuracy of the data to be trained can be guaranteed.

In one embodiment of the present application, the topological relation of the vehicle end driving module refers to a connection mode and a communication mode between the modules. In general, the vehicle-end driving module includes a sensing module, a decision module, and an executing module. In the embodiment of the application, the data dependence and the data index relation of the driving scene metadata are determined based on the topological association relation of the driving module preset at the vehicle end, a data dependence label and a data index label are generated according to the data dependence and the data index relation, the driving scene metadata are marked according to the data dependence label and the data index label, and the marked driving scene metadata are obtained.

In an embodiment of the present application, after determining the data dependence and the data index relationship of the driving scene metadata based on the topological association relationship of the driving module preset at the vehicle end, the driving scene metadata may be further subjected to data analysis and data association relationship mining according to the model training requirement to obtain related data attribute information and tag information of the data association relationship, and besides, the method further includes model training and change tag information after application scene change. It should be noted that, in the practical implementation process of the scheme, the expandable label information can be actually adjusted and expanded on the basis of the requirements of the vehicle end model and the application scene training, and the exemplary expandable label type examples do not all have expansion necessity, but are only selectable expandable label examples.

In one embodiment of the application, the annotated driving scene metadata is subjected to data format conversion according to a preset training data format of the driving model, the driving scene metadata after format conversion is obtained, and the data to be trained for the vehicle-end driving model is generated based on the driving scene metadata after format conversion.

In the embodiment, the data dependence and the index relationship are determined based on the topological relationship of the vehicle-end driving module, so that the blood-edge continuation relationship of the data can be ensured after the data is indexed and marked by the dependence labels, and the embodiment of the topological relationship and the association degree determination of the data structure during the data retrieval are facilitated.

In one embodiment of the present application, the driving scenario metadata after the format conversion is divided into a training data set and a verification data set according to a preset division ratio, and the training data set and the verification data set are stored in a preset training data storage area.

Fig. 4 is a flowchart illustrating a specific method for generating data to be trained according to an exemplary embodiment of the present application. The method may be applied to the implementation environment shown in fig. 1 and implemented by the vehicle end 101 and the computer device 102 in the implementation environment. It should be understood that the method may be applied to other exemplary implementation environments and be specifically executed by devices in other implementation environments, and the implementation environments to which the method is applied are not limited by the present embodiment.

As shown in fig. 4, in a specific embodiment of the present application, the specific data to be trained generation method includes three steps of data receiving, data screening and data processing, where the data receiving specifically includes accessing data to obtain initial driving increment data, and performing data analysis on the driving increment data to obtain driving increment data for uploading to the cloud.

In one embodiment of the present application, after filtering the data, a meta data packet may be obtained, where the meta data packet is metadata in the foregoing embodiment, and then, based on nodes 1 to 3, data processing is performed on the meta data packet, to obtain a meta data packet inheriting a basic attribute of the data before processing, and a tag and an attribute that expand the meta data packet in a data processing process.

In one embodiment of the present application, the node 1 performs data cleansing on meta data, and the processing result is a subset of the meta data; node 2 performs mining analysis on meta data, and the process derives new tag information or attribute information to supplement the description information of meta data; the node 3 reads the file data in meta data, converts the file, derives a new file, and generates data to be trained based on the new file.

In a specific embodiment of the application, the DataAPI (data resource) provides a database operation interface, and a user can interact with the database through an HTTPS request or an SDK, so that the use of a Web service interface is realized, the cost of a developer for managing an application server is reduced, and the development targets of simplification and efficiency are realized. In this embodiment, dataApi is adopted for the three nodes to maintain each type of processing result, the data of the node 1 is cleaned, the index relation between the node task and the cleaned data is reserved, and only index information is transmitted when the data flows backwards; for the mining analysis scene of the node 2, reserving the structural relation between the node task and meta data, an expansion tag or an attribute, and transferring original meta data and expansion information to a backward flow; for the derived file scene of node 3, a new meta data record is created, the file in this data record is the derived file, and the rest of the information inherits the original meta data.

The nodes are in serial-parallel connection, and a workflow engine is adopted to take charge of resource scheduling and task state management and flow management of the node data processing tasks.

FIG. 5 is a flow chart illustrating one particular driving delta data acquisition according to an exemplary embodiment of the present application. The method may be applied to the implementation environment shown in fig. 1 and implemented by the vehicle end 101 and the computer device 102 in the implementation environment. It should be understood that the method may be applied to other exemplary implementation environments and be specifically executed by devices in other implementation environments, and the implementation environments to which the method is applied are not limited by the present embodiment.

In a specific embodiment of the present application, as shown in fig. 5, step S1 is to collect, store and scan autopilot data, where the storing operation uses a disk inserting operation for connecting a data disk to a data center by a transfer server, and uses a timing scanning service to detect a data disk file update status.

And step S2, performing storage operation on the database, if an updated file exists, copying the updated file to a data center distributed file system, and simultaneously storing file index information into a non-relational database so as to search the file. The file name of the updated file can be set to carry information such as a vehicle number, recording time and the like so as to enrich the index information of the file. If structured data exists, the structured data is stored in a relational database.

Step S3 is to analyze the updated file, after detecting the file update, manage the file analysis task by using a distributed task queue, distribute the analysis task to the updated file by using a message middleware (RabbitMQ), analyze the file in each task execution unit (Celeryworker), and in this specific embodiment, select a Celery distributed asynchronous task framework to realize the parallel analysis of the file, so as to accelerate the data analysis speed.

In step S4, the file data stored in the standard is parsed, and the vehicle information, sensor signals, etc. recorded in the update file need to be acquired, and if the update file records the camera image stream or the laser radar data in the specific implementation, the image stream is converted into the JPG image format for storage, and the laser radar data is converted into the PCD standard point cloud file format for storage. And aligning data according to the time stamp of each frame, fitting and supplementing the missing values, aggregating the missing values into one frame of metadata, and storing the metadata into a non-relational database, wherein the metadata records vehicle information, sensor signals, camera pictures, point cloud files and other vehicle-end driving data of specific time points.

According to the method, the device, the electronic equipment and the storage medium for generating the data to be trained, driving increment data and scene application types of a vehicle end are obtained, the driving increment data are uploaded to a cloud end, the driving increment data are classified based on the scene application types, a plurality of scene type data sets are obtained, metadata extraction is carried out to obtain driving scene metadata, and the driving scene metadata is subjected to data format conversion according to a pre-configured data processing node and then is used for generating a number to be trained for a vehicle end driving model based on the driving scene metadata after the format conversion; according to the scheme, the data is uploaded to the cloud for corresponding data operation, so that waste of local storage resources by intermediate data can be effectively avoided, metadata extraction of scene data is based, the overall annotation and control of the data are performed on the basis of the metadata, the operation efficiency of overall control of the data is effectively improved, generation of data to be trained required by a vehicle end model is facilitated, the initial driving data volume of the vehicle end is conveniently monitored, data increment is determined, data extraction is performed on the data increment according to a data structure, unified and standardized storage format data content is obtained, further processing operation of the data and adjustment and check of the intermediate data are facilitated, and unified storage can be performed; generating a time stamp according to the data acquisition time, marking the data, and storing the aligned data based on a uniform time sequence, so that the data storage can be arranged based on the uniform sequence, and the data of a specific time node can be conveniently extracted and processed; the data dependence and the index relationship are determined based on the topological relationship of the vehicle-end driving module, so that the blood-source continuation relationship of the data can be ensured after the data is subjected to index and dependence label marking, and the embodiment of the topological relationship and the association degree determination of the data structure are facilitated when the data is called; after the metadata is subjected to data deduplication, data fitting and data desensitization, the effective degree of the data can be ensured, the training data can be further processed conveniently, and the accuracy of the data to be trained can be ensured; according to the key characteristics of the data scene, the data is retrieved, the data demand range based on the training model can be reduced, the data quantity to be processed is reduced, and therefore the data processing efficiency is improved.

The following describes an embodiment of the apparatus of the present application, which may be used to execute the method for generating data to be trained in the foregoing embodiment of the present application. For details not disclosed in the embodiments of the system of the present application, please refer to the embodiments of the method for generating data to be trained described in the present application.

Fig. 6 is a schematic diagram of a device for generating data to be trained according to an exemplary embodiment of the present application. The apparatus may be applied in the implementation environment shown in fig. 2 and is specifically configured in the computer device 102. The apparatus may also be suitable for other exemplary implementation environments, and may be specifically configured in other devices, and the embodiment is not limited to the implementation environment in which the apparatus is suitable.

As shown in fig. 6, the exemplary data to be trained generating apparatus includes: the device comprises a vehicle end data acquisition module 601, an incremental data classification module 602, a metadata extraction module 603 and a metadata processing module 604.

The vehicle end data acquisition module 601 is configured to acquire driving incremental data and a scene application type of a vehicle end; the incremental data classification module 602 is configured to classify the driving incremental data based on the scene application type, to obtain a plurality of scene type data sets, where the scene type data sets include a plurality of driving incremental data; the metadata extraction module 603 is configured to perform metadata extraction on the plurality of scene type data sets to obtain driving scene metadata; the metadata processing module 604 is configured to perform data format conversion on the driving scene metadata according to a preconfigured data processing node, and generate data to be trained for a vehicle-end driving model based on the driving scene metadata after format conversion.

The embodiment of the application also provides electronic equipment, which comprises: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the electronic equipment realizes the data to be trained generating method provided in each embodiment.

Fig. 7 is a schematic diagram of a computer system of an electronic device according to an exemplary embodiment of the present application. It should be noted that, the computer system 700 of the electronic device shown in fig. 7 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 7, the computer system 700 includes a central processing unit (CentralProcessingUnit, CPU) 701, which can perform various appropriate actions and processes according to a program stored in a Read-only memory (ROM) 702 or a program loaded from a storage section into a random access memory (RandomAccessMemory, RAM) 703, for example, performing the methods described in the above embodiments. In the RAM703, various programs and data required for the system operation are also stored. The CPU701, ROM702, and RAM703 are connected to each other through a bus. An Input/Output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a cathode ray tube (CathodeRayTube, CRT), a liquid crystal display (LiquidCrystalDisplay, LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN (local area network) card, a modem, or the like. The communication section performs communication processing via a network such as the internet. The drives are also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed on the drive 710 as needed, so that a computer program read out therefrom is installed into the storage section 708 as needed.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. When executed by a Central Processing Unit (CPU) 701, performs the various functions defined in the system of the present application.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (ErasableProgrammableReadOnlyMemory, EPROM), a flash memory, an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with a computer-readable computer program embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In the corresponding figures of the above embodiments, connecting lines may represent connection relationships between various components to represent further constituent signal paths (constituentjsignalpath) and/or one or more ends of some lines having arrows to represent main information flow, the connecting lines being an indication, not a limitation of the scheme itself, but rather the use of these lines in connection with one or more example embodiments may facilitate easier connection to circuits or logic elements, any represented signal (determined by design requirements or preferences) may actually comprise one or more signals that may be transmitted in either direction and may be implemented in any suitable type of signal scheme.

The units involved in the embodiments of the present application may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

Another aspect of the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described above. The computer-readable storage medium may be included in the electronic device described in the above embodiment or may exist alone without being incorporated in the electronic device.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, in accordance with embodiments of the present application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.

It should be appreciated that the subject application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.

It should be understood that the foregoing is only a preferred exemplary embodiment of the present application and is not intended to limit the embodiments of the present application, but rather that corresponding variations or modifications will be apparent to those skilled in the art from the main concepts and spirit of the present application, and that the scope of this application is defined in the appended claims.

Claims

1. The method for generating the data to be trained is characterized by comprising the following steps of:

acquiring driving increment data and scene application types of a vehicle end, and uploading the driving increment data to a cloud;

classifying the driving increment data based on the scene application type to obtain a plurality of scene type data sets, wherein the scene type data sets comprise a plurality of driving increment data;

extracting metadata from the scene type data sets to obtain driving scene metadata;

and carrying out data format conversion on the driving scene metadata according to a pre-configured data processing node, and generating data to be trained for a vehicle-end driving model based on the driving scene metadata after format conversion.

2. The method for generating data to be trained according to claim 1, wherein obtaining driving increment data of a vehicle end comprises:

Acquiring initial driving data quantity of a vehicle end, and monitoring the data total quantity of the initial driving data quantity of the vehicle end;

if the total data amount is increased, extracting initial incremental data, and determining the structure type of the initial incremental data, wherein the structure type comprises unstructured data and structured parameters;

if the incremental data are unstructured data, driving parameter extraction is carried out on the unstructured data according to a preset data storage format, and vehicle-end incremental parameters are obtained;

and determining the vehicle-end increment parameters and the structured data as the driving increment data, and storing the driving increment data into a preset increment data storage area.

3. The data to be trained generation method according to claim 2, characterized in that after determining the vehicle-end increment parameter and structured data as the driving increment data, the data to be trained generation method further comprises:

acquiring data acquisition time of the driving incremental data, and performing time stamp marking on the driving incremental data according to the data acquisition time to obtain driving incremental data with time stamp marking;

generating a storage time sequence according to a preset time interval, and carrying out data alignment on driving increment data with a time stamp according to the time stamp mark in the time sequence so as to store the driving increment data according to the time sequence.

4. The method for generating data to be trained according to claim 1, characterized in that performing data format conversion on the driving scenario metadata according to a pre-configured data processing node comprises:

determining the data dependence and the data index relation of the driving scene metadata based on the topological association relation of a vehicle-end preset driving module;

generating a data dependency label and a data index label according to the data dependency and data index relation, and marking the driving scene metadata according to the data dependency label and the data index label to obtain marked driving scene metadata;

and carrying out data format conversion on the marked driving scene metadata according to a preset training data format of the driving model to obtain driving scene metadata after format conversion, and generating data to be trained for the driving model of the vehicle end based on the driving scene metadata after format conversion.

5. The method for generating data to be trained according to claim 4, characterized in that before determining the data dependence and the data index relationship of the driving scenario metadata based on the topological association relationship of the vehicle-end preset driving module, the method for generating data to be trained further comprises:

And performing data cleaning processing on the driving scene metadata, wherein the data cleaning processing comprises data deduplication, data fitting and data desensitization so as to perform invalid data elimination on the driving scene metadata.

6. The method for generating data to be trained according to claim 4, characterized in that generating data to be trained required for a vehicle-end driving model based on the format-converted driving scene metadata comprises:

and dividing the driving scene metadata after format conversion into a training data set and a verification data set according to a preset dividing proportion, and storing the training data set and the verification data set into a preset training data storage area.

7. The method for generating data to be trained according to any one of claims 1 to 6, characterized in that after metadata extraction is performed on the plurality of scene type data sets, the method for generating data to be trained further comprises:

acquiring key characteristics of a data scene of a vehicle-end driving model;

and carrying out data retrieval in the extracted initial metadata based on the data scene key features so as to determine driving scene metadata.

8. A data to be trained generating apparatus, characterized in that the data to be trained generating apparatus comprises:

The vehicle end data acquisition module is used for acquiring driving increment data and scene application types of the vehicle end;

the incremental data classification module is used for classifying the driving incremental data based on the scene application type to obtain a plurality of scene type data sets, wherein the scene type data sets comprise a plurality of driving incremental data;

the metadata extraction module is used for extracting metadata from the scene type data sets to obtain driving scene metadata;

and the metadata processing module is used for carrying out data format conversion on the driving scene metadata according to a preconfigured data processing node and generating data to be trained for a vehicle-end driving model based on the driving scene metadata after format conversion.

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to implement the data to be trained generation method of any of claims 1 to 7.

10. A computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method of generating data to be trained as claimed in any one of claims 1 to 7.