WO2023207228A1 - 一种基于隐私数据保护的智能网联汽车数据训练方法、电子设备及计算机可读存储介质 - Google Patents

一种基于隐私数据保护的智能网联汽车数据训练方法、电子设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2023207228A1
WO2023207228A1 PCT/CN2023/072690 CN2023072690W WO2023207228A1 WO 2023207228 A1 WO2023207228 A1 WO 2023207228A1 CN 2023072690 W CN2023072690 W CN 2023072690W WO 2023207228 A1 WO2023207228 A1 WO 2023207228A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
feature extraction
low
model
level feature
Prior art date
Application number
PCT/CN2023/072690
Other languages
English (en)
French (fr)
Inventor
郝金隆
唐照翔
Original Assignee
重庆长安汽车股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 重庆长安汽车股份有限公司 filed Critical 重庆长安汽车股份有限公司
Priority to EP23794664.5A priority Critical patent/EP4332815A1/en
Publication of WO2023207228A1 publication Critical patent/WO2023207228A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y20/00Information sensed or collected by the things
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y20/00Information sensed or collected by the things
    • G16Y20/20Information sensed or collected by the things relating to the thing itself
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y20/00Information sensed or collected by the things
    • G16Y20/40Information sensed or collected by the things relating to personal data, e.g. biometric data, records or preferences
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y40/00IoT characterised by the purpose of the information processing
    • G16Y40/10Detection; Monitoring
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y40/00IoT characterised by the purpose of the information processing
    • G16Y40/20Analytics; Diagnosis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y40/00IoT characterised by the purpose of the information processing
    • G16Y40/50Safety; Security of things, users, data or systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present invention relates to the improvement of data processing technology for intelligent connected cars, and specifically relates to an intelligent connected car data training method based on privacy data protection, which belongs to the technical field of data processing and training.
  • the CN202210057268 data collection method, device, equipment and storage medium of Zhejiang Geely Holding Group Co., Ltd. proposes the following technology: during the driving process of the vehicle, the vehicle-side data collected by the vehicle-side sensors and the data collected by the road-side sensors are obtained. For road-end data, the vehicle-end data and road-end data are synchronized in space and time, and the space-time synchronized vehicle-end data and road-end data are fused according to the high-precision map to obtain target data.
  • Scene classification is performed based on the target data to obtain multiple The scene data corresponding to the scene is used to construct an automatic driving scene library based on the scene data.
  • (2) Zhejiang Leapao Technology Co., Ltd.’s CN201910454082 L3 level autonomous driving system road driving data collection, analysis and upload method proposes the following technology: collecting vehicle-side driving data, including the collection and synchronization of driving data and the encoding and uploading of driving data Cache, perform online data analysis on the collected vehicle-side driving data, including automatic driving system intermediate result output interface definition, target matching consistency detection, positioning road sign semantic output, extreme vehicle operation detection and human-machine decision-making consistency detection, and then perform Data communication, prepare the vehicle-side driving data for uploading, and finally the server-side receives and stores the vehicle-side driving data.
  • the existing technology mainly anonymizes the data on the vehicle side and then uploads it to the cloud, where the anonymized data is used for model training.
  • a serious shortcoming of this type of technology is that some important information will be lost after the original data is anonymized, causing the trained algorithm model to be biased when predicting non-anonymized data, resulting in large errors, thus affecting the accuracy of the algorithm. After mass-produced cars are actually on the road, the algorithm performance when using raw data for autonomous driving-related functions will be reduced.
  • the purpose of the present invention is to provide an intelligent network-connected car data training method based on privacy data protection.
  • the present invention ensures data privacy transmission while solving the problem of reduced algorithm performance caused by anonymized data. problem, and on this basis, an algorithm closed loop is constructed to solve the problem of model iterative update.
  • An intelligent connected car data training method based on privacy data protection including the following steps:
  • step 2) Feature extraction of original data on the car end; on the car end, for the original data collected in real time or historically on the car end, feature extraction is performed through the low-level feature extraction layer deployed in step 1), and a low-level feature data set of the original data is obtained and uploaded to cloud;
  • Model optimization in the cloud, use the model update data set obtained in step 4) to train and update other feature extraction layers except the low-level feature extraction layer in the first version of the model; the low-level feature extraction layer and other updated feature extraction layers together as the optimized model, and the optimized model is pushed to the vehicle end for synchronous updates.
  • step 4 in the cloud, the road test data is extracted through the same low-level feature extraction layer in the first version model as the low-level feature extraction layer deployed to the vehicle end, and the low-level feature data set of the road test data is obtained.
  • the union of the low-level feature data set of the road test data and the low-level feature data of the original data uploaded to the cloud is used together as the low-level feature data set of step 4).
  • the model used in the model training in step 1) is a deep neural network.
  • the deep neural network includes but is not limited to convolutional neural network, recurrent neural network and related variants, etc.
  • the supported algorithms include but are not limited to target detection algorithm, lane line recognition algorithm, semantic segmentation algorithm, etc.
  • the key information of the original data includes but is not limited to faces and license plates, and anonymization processing includes mosaic, solid color filling, and blur processing.
  • the methods used for feature extraction in step 2) of the present invention include but are not limited to convolution, pooling, and slicing.
  • the low-level feature extraction layer deployed to the vehicle in step 1) has multiple layers. During each deployment, different layers of low-level feature extraction layers are deployed on the vehicle at the same time; step 5) train and update multiple layers related to the vehicle in the cloud. Other feature extraction layers corresponding to the low-level feature extraction layer on the end of the vehicle are used to obtain multiple optimized models, and the model with the best performance is synchronized to the car end.
  • the model update data sets used for iterative update of the algorithm in step 4) include but are not limited to road test data sets and original data sets collected by the vehicle end. Other data sets obtained by using data enhancement are also included, including but not limited to low-level data sets. Data generated by flipping, rotating, and scaling operations on hierarchical feature sets.
  • the present invention also provides an electronic device for intelligent connected car data training based on privacy data protection, including a memory configured to store executable instructions;
  • the processor is configured to execute executable instructions stored in the memory to implement the aforementioned intelligent connected vehicle data training method based on privacy data protection.
  • the present invention also provides a computer-readable storage medium on which computer program instructions are stored.
  • the computer program instructions execute the aforementioned intelligent networked vehicle data training method based on privacy data protection.
  • the present invention has the following beneficial effects:
  • the present invention processes key user information and can protect user privacy and security. Key information is not leaked or uploaded, and meets regulatory requirements.
  • This invention can retain the information of the original data to a greater extent, compared with relying on purely anonymized data for model training.
  • the present invention can effectively improve the training effect of the algorithm by eliminating the shortcoming of losing a large amount of useful information.
  • the present invention can realize a closed-loop of data collection, annotation, and training for intelligent connected vehicles, and can continuously improve the performance of the autonomous driving algorithm after mass production without spending large vehicle-end road procurement and algorithm update costs.
  • Figure 1 is a logical architecture diagram of the intelligent connected vehicle data training method based on privacy data protection of the present invention.
  • the present invention proposes a data collection, training and iteration method for connected cars based on privacy data protection, as shown in Figure 1, which is divided into three parts: (1) cloud algorithm development before mass production; (2) mass production data Acquisition preprocessing; (3) Algorithm iterative update.
  • cloud algorithm development before mass production (2) mass production data Acquisition preprocessing
  • Algorithm iterative update (3) Algorithm iterative update.
  • it can protect the security of user privacy data and make full use of the information of original data and anonymized data to improve algorithm performance.
  • it can form a closed loop of the algorithm and improve the ability of iterative update of the model.
  • the three parts are described in detail below.
  • the historically collected road test data is first annotated, such as pedestrians, vehicles, road signs, traffic lights and other information, and then the model is trained.
  • the model is trained.
  • Deep neural networks that perform feature extraction processing such as pooling and slicing can effectively extract image features and are often used in scenarios such as image classification and target recognition. Different models and depths are selected according to different target tasks.
  • the first version of the mass-produced large model is obtained.
  • the low-level feature extraction layer of the model is deployed to the vehicle end. The specific number of deployment layers is based on the processing capabilities and data of the vehicle end MCU. Depends on the upload bandwidth.
  • the model training method of the present invention supports both incremental transfer learning and full algorithm training.
  • the mass production data collection and preprocessing stage includes the collection and preprocessing of three parts of data.
  • feature extraction is performed through the low-level convolutional layer deployed before mass production, and the low-level features of the original data are obtained and uploaded to the cloud.
  • Low-level features are some local features in the original data, retaining the relationship between the local and the whole, usually some straight line and curve features. Since feature extraction will lose some information, it is difficult to identify the target object intuitively with these feature data, which meets regulatory requirements and can be uploaded to the cloud.
  • the same feature extraction operation is performed on massive road test data to obtain a low-level feature data set of the road test data.
  • the low-level feature data set of the road test data and the low-level features of the original data uploaded to the cloud are obtained.
  • the union of the two parts of the data is used for the next model iteration.
  • the anonymized data is obtained and uploaded to the cloud for data annotation.
  • the anonymized pictures The video does not affect the judgment of object category and location, so it can be accurately marked without infringing on user privacy.
  • Data upload can be uploaded when the vehicle is in standby state. On the one hand, it does not affect the performance of the vehicle-side algorithm, and on the other hand, it can keep the transmission stable.
  • the cloud uses the low-level feature data set obtained in the previous stage and the corresponding annotation result data set for the next stage of model training, and can also be integrated with historical training data.
  • the training data is obtained based on the low-level network, there is no need to train the entire network model at this stage, but only the high-level network part of the mass-produced large model needs to be trained and updated.
  • the low-level feature extraction layer and the training update Other high-level feature extraction layers are used together as the optimized model, and a certain push strategy is adopted, such as regular push or version update, to push the updated optimized model to the vehicle end for synchronous updates to achieve automatic driving data collection, training and deployment. Closed loop, continue to improve the algorithm performance and driving experience of mass-produced vehicles in actual use.
  • the present invention adds feature extraction operations to obtain low-level feature data of the original data, and uses these two parts of data to solve the problem of reduced algorithm performance caused by anonymized data, and on this basis
  • An algorithm closed loop is constructed to solve the problem of model iterative update. This part of the data after feature extraction is no longer the original data, and uploading it to the cloud will not leak private information; the original data will be anonymized before uploading, so the private information will not be leaked; and the anonymization process will not affect the accuracy of the annotation. . Therefore, the present invention proposes a closed-loop method of automatic driving algorithm for data collection, annotation, and training of connected cars. The present invention also proposes an in-vehicle data processing method that is suitable for privacy protection and retains original data information to a high degree.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Bioethics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Traffic Control Systems (AREA)

Abstract

本发明公开了一种基于隐私数据保护的智能网联汽车数据训练方法,先对历史采集的路试数据进行标注,然后进行模型训练,得到初版模型,将初版模型的低层特征提取层部署至车端;在车端,针对车端采集的原始数据,一方面通过部署的低层特征提取层进行特征提取,得到低层次特征数据集并上传到云端;另一方面,对原始数据进行匿名化处理并上传到云端进行数据标注;在云端,利用上传的数据集和历史数据集对除低层特征提取层外的其他特征提取层进行训练并更新;并将更新后的模型推送给车端进行同步更新。本发明在保证数据隐私传递的同时,解决了匿名化数据导致算法性能降低的问题,并在此基础上构造出算法闭环,解决模型迭代更新的问题。

Description

一种基于隐私数据保护的智能网联汽车数据训练方法、电子设备及计算机可读存储介质 技术领域
本发明涉及智能网联汽车数据处理技术改进,具体涉及一种基于隐私数据保护的智能网联汽车数据训练方法,属于数据处理训练技术领域。
背景技术
随着自动驾驶技术在国内的蓬勃发展,各大主机厂商和技术供应商都在努力提升竞争力,提出了各种数据采集和算法训练的方法。例如:(1)浙江吉利控股集团有限公司的CN202210057268数据采集方法、装置、设备及存储介质,提出如下技术:在车辆行驶的过程中,获取车端传感器采集的车端数据以及路端传感器采集的路端数据,对车端数据和路端数据进行时空同步,并根据高精地图对时空同步后的车端数据和路端数据进行融合,得到目标数据,基于目标数据进行场景分类,获得多个场景对应的场景数据,根据场景数据构建自动驾驶场景库。(2)浙江零跑科技有限公司的CN201910454082 L3级自动驾驶系统道路驾驶数据采集、分析及上传方法,提出了如下技术:采集车端驾驶数据,包括驾驶数据的采集与同步及驾驶数据的编码与缓存,对采集到的车端驾驶数据进行在线数据分析,包括自动驾驶系统中间结果输出接口定义、目标匹配一致性检测、定位路标语义输出、极端车辆操作检测及人机决策一致性检测,然后进行数据通信,对车端驾驶数据做好上传准备,最后服务器端接收并存储车端驾驶数据。
因此,越来越多的数据被采集和用于算法训练,以提升自动驾驶的性能,但存在着用户关键信息被泄露的问题。为了保护用户关键信息不被恶意使用,相关部门规定汽车在收集车外的视频、图像数据时,不能直接将未经处理的数据向车外提供,如需向车外提供,则需要在车内对数据中的人脸、车牌信息等进行匿名化处理。因此,如何在保护用户隐私安全的前提下,智能网联汽车合理高效的采集和使用数据成为了一个关键挑战。
现有技术主要是在车端进行数据的匿名化处理,然后上传至云端,在云端利用匿名化后的数据进行模型的训练。这类技术的一个严重缺点是:原始数据经过匿名化后会丢失一些重要的信息,导致训练得到的算法模型在对非匿名化数据进行预测时存在偏见,会产生较大的误差,从而影响到算法的准确性。量产车实际上路后,会降低利用原始数据进行自动驾驶相关功能时的算法性能。
另外,量产后的网联汽车,其自动驾驶数据的采集、训练和部署未形成算法闭环,降低算法训练和模型迭代的速度。
发明内容
针对现有技术存在的上述不足,本发明的目的是提供一种基于隐私数据保护的智能网联汽车数据训练方法,本发明在保证数据隐私传递的同时,解决了匿名化数据导致算法性能降低的问题,并在此基础上构造出算法闭环,解决模型迭代更新的问题。
本发明的技术方案是这样实现的:
一种基于隐私数据保护的智能网联汽车数据训练方法,包括如下步骤,
1)初版模型获取;在云端,先对历史采集的路试数据进行标注,然后进行模型训练,得到初版模型,将初版模型的低层特征提取层(如前两层的卷积层)部署至车端;
2)车端原始数据特征提取;在车端,针对车端实时或历史采集的原始数据,通过步骤1)部署的低层特征提取层进行特征提取,得到原始数据的低层次特征数据集并上传到云端;
3)车端数据脱敏处理;在车端对原始数据中的关键信息进行匿名化处理,得到匿名化数据后上传到云端并进行数据标注,得到标注结果数据集;
4)云端模型更新数据准备;将步骤3)标注结果数据集中的数据与步骤2)低层次特征数据集中的数据进行一一对应,从而形成模型更新数据集;
5)模型优化;在云端,利用步骤4)得到的模型更新数据集,对初版模型中除低层特征提取层外的其他特征提取层进行训练并更新;低层特征提取层与更新后的其他特征提取层一起作为优化后的模型,并将优化后的模型推送给车端进行同步更新。
进一步地,在步骤4)中,在云端,路试数据通过初版模型中与部署至车端的低层特征提取层相同的低层特征提取层进行特征提取,得到路试数据的低层次特征数据集,取路试数据的低层次特征数据集与上传到云端的原始数据的低层次特征数据两部分数据的并集,一起作为步骤4)的低层次特征数据集。
优选地,步骤1)的模型训练采用的模型为深度神经网络。
进一步地,所述深度神经网络包含但不限于卷积神经网络、循环神经网络及其相关变种等,所支持的算法包括但不限于目标检测算法、车道线识别算法、语义分割算法等。
在步骤3)中,原始数据的关键信息包括但不限于人脸和车牌,匿名化处理包括打马赛克、纯色填充、模糊处理。
本发明步骤2)中特征提取所用方法包含但不限于卷积、池化、切片。
步骤1)中部署至车端的低层特征提取层有多种层数,每次部署时,将不同层数的低层特征提取层同时部署在车端;步骤5)在云端训练并更新多个与车端低层特征提取层对应的其他特征提取层,由此得到多个优化后的模型,将性能最佳的一个模型同步给车端。
步骤4)中用于算法迭代更新的模型更新数据集包括但不限于路试数据集、车端采集的原始数据集,其它利用数据增强而得到数据集也包含在内,包括但不限于对低层次特征集进行翻转、旋转、缩放操作而生成的数据。
本发明还提供了一种基于隐私数据保护的智能网联汽车数据训练电子设备,包括存储器,配置为存储可执行指令;
处理器,配置为执行存储器中存储的可执行指令,以实现前述的一种基于隐私数据保护的智能网联汽车数据训练方法。
本发明还同时提供了一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令执行前述的一种基于隐私数据保护的智能网联汽车数据训练方法。
与现有技术相比,本发明具有如下有益效果:
1、本发明通过在车端增加特征提取操作和匿名化处理,对用户关键信息进行处理,能够保护用户隐私安全,关键信息不泄露,不上传,符合监管要求。
2、本发明能较大程度保留原始数据的信息,相较于依靠纯匿名化数据进行模型训 练会丢失大量有用信息的短板,本发明能有效提升算法的训练效果。
3、本发明能实现智能网联汽车的数据采集、标注、训练闭环,能够在量产后持续提升自动驾驶算法的性能,而不需要花费较大车端路采和算法更新成本。
附图说明
图1为本发明基于隐私数据保护的智能网联汽车数据训练方法的逻辑架构图。
具体实施方式
以下结合附图和具体实施方式对本发明做进一步详细描述。
本发明提出了一种基于隐私数据保护的网联汽车数据采集、训练和迭代方法,如图1所示,分为三个部分:(1)量产前云端算法开发;(2)量产数据采集预处理;(3)算法迭代更新。通过此三部分,一方面可以保护用户隐私数据安全,充分利用原始数据和匿名化数据的信息来提升算法性能,另一方面能形成算法闭环,提高模型迭代更新的能力。以下分别对三部分进行详细说明。
在量产前云端算法开发阶段(即初版模型获取),先对历史采集的路试数据进行标注,比如标注行人、车辆、路牌、红绿灯等信息,然后进行模型训练,这里可以采用带卷积、池化、切片等特征提取处理的深度神经网络,能够有效地提取图像特征,常用于图像分类、目标识别等场景。根据目标任务的不同选择不同的模型和深度,经过反复调参优化,得到初版的量产大模型,将模型的低层特征提取层部署至车端,具体部署层数根据车端MCU处理能力和数据上传带宽来决定。较少的层数能保留更多输入数据的信息,更多的层数能提取到输入数据更高层次的特征,也能减少数据的传输量,但会丢失更多的信息,同时也会增加车端MCU的处理负担。本发明模型训练方法同时支持增量迁移学习和全量算法训练。
在量产数据采集预处理阶段,即步骤2)-4),包含三部分数据的采集和预处理。在量产车内部,针对车端实时或历史采集的原始图片和视频数据,一方面通过量产前部署的低层卷积层进行特征提取,得到原始数据的低层次特征并上传到云端。低层次特征是原始数据中的一些局部特征,保留着局部和整体的关系,通常是一些直线和曲线的特征。由于特征提取会丢失一些信息,因此这些特征数据直观上很难辨别目标物体,符合监管要求,可以上传到云端。类似的,在云端,海量路试数据也进行相同的特征提取操作,得到路试数据的低层次特征数据集,取路试数据的低层次特征数据集与上传到云端的原始数据的低层次特征数据两部分数据的并集,用于下一次模型迭代。另一方面,在量产车辆内部,通过对原始数据中的关键信息进行匿名化处理,如对人脸和车牌打马赛克,得到匿名化数据后上传到云端进行数据标注,匿名化后的图片、视频并不影响物体类别和位置的判断,因此在能够准确的打标的同时,而不会侵犯用户隐私。数据上传可以在车辆待机状态时上传,一方面不影响车端算法性能,另一方面可以保持传输稳定。
在算法迭代更新阶段(即模型优化),云端将上一阶段获取到的低层次特征数据集和对应的标注结果数据集用于下阶段的模型训练,同时可以与历史训练数据进行融合,此处包含但不限于路试数据集和车端历史采集的原始数据集,以及其它可以用于数据增强方法生成的数据集。因为训练数据是基于低层次网络获取的,所以此阶段无需训练整个网络模型,而只需要训练和更新量产大模型的高层次网络部分,低层特征提取层与训练更新后 的其他高层特征提取层一起作为优化后的模型,采用某种推送策略,如定期推送或版本更新,将更新后的优化模型推送给车端进行同步更新,实现自动驾驶数据采集、训练到部署的闭环,持续提升量产车实际使用时的算法性能和驾驶体验。
本发明在对车端原始数据进行匿名化处理的同时,增加特征提取操作来得到原始数据的低层次特征数据,利用这两部分数据来解决匿名化数据导致算法性能降低的问题,并在此基础上构造出算法闭环,解决模型迭代更新的问题。特征提取后的这部分数据已不是原始数据,上传云端不会泄露隐私信息;原始数据匿名化处理后再上传,自然不会泄露隐私信息;匿名化处理进行标注,也不会影响标注的准确性。因此本发明提出了一种网联汽车的数据采集、标注、训练的自动驾驶算法闭环方法。本发明还提出了一种能适用于隐私保护且较高程度保留原始数据信息的车内数据处理方法。
最后需要说明的是,本发明的上述实例仅仅是为说明本发明所作的举例,而并非是对本发明的实施方式的限定。尽管申请人参照较佳实施例对本发明进行了详细说明,对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其他不同形式的变化和变动。这里无法对所有的实施方式予以穷举。凡是属于本发明的技术方案所引申出的显而易见的变化或变动仍处于本发明的保护范围之列。

Claims (10)

  1. 一种基于隐私数据保护的智能网联汽车数据训练方法,其特征在于:包括如下步骤,
    1)初版模型获取;在云端,先对历史采集的路试数据进行标注,然后进行模型训练,得到初版模型,将初版模型的低层特征提取层部署至车端;
    2)车端原始数据特征提取;在车端,针对车端实时或历史采集的原始数据,通过步骤1)部署的低层特征提取层进行特征提取,得到原始数据的低层次特征数据集并上传到云端;
    3)车端数据脱敏处理;在车端对原始数据中的关键信息进行匿名化处理,得到匿名化数据后上传到云端并进行数据标注,得到标注结果数据集;
    4)云端模型更新数据准备;将步骤3)标注结果数据集中的数据与步骤2)低层次特征数据集中的数据进行一一对应,从而形成模型更新数据集;
    5)模型优化;在云端,利用步骤4)得到的模型更新数据集,对初版模型中除低层特征提取层外的其他特征提取层进行训练并更新;低层特征提取层与更新后的其他特征提取层一起作为优化后的模型,并将优化后的模型推送给车端进行同步更新。
  2. 根据权利要求1所述的一种基于隐私数据保护的智能网联汽车数据训练方法,其特征在于:步骤4)中,在云端,路试数据通过初版模型中与部署至车端的低层特征提取层相同的低层特征提取层进行特征提取,得到路试数据的低层次特征数据集,取路试数据的低层次特征数据集与上传到云端的原始数据的低层次特征数据两部分数据的并集,一起作为步骤4)的低层次特征数据集。
  3. 根据权利要求1所述的一种基于隐私数据保护的智能网联汽车数据训练方法,其特征在于:步骤1)的模型训练采用的模型为深度神经网络;
    所述深度神经网络包含但不限于卷积神经网络、循环神经网络及其相关变种,所支持的算法包括但不限于目标检测算法、车道线识别算法、语义分割算法。
  4. 根据权利要求1所述的一种基于隐私数据保护的智能网联汽车数据训练方法,其特征在于:步骤3)中,原始数据的关键信息包括但不限于人脸和车牌,匿名化处理包括但不限于打马赛克、纯色填充、模糊处理。
  5. 根据权利要求1所述的一种基于隐私数据保护的智能网联汽车数据训练方法,其特征在于:步骤2)中特征提取所用方法包含但不限于卷积、池化、切片。
  6. 根据权利要求1所述的一种基于隐私数据保护的智能网联汽车数据训练方法,其特征在于:步骤2)和步骤3)在车辆待机状态时将数据上传到云端。
  7. 根据权利要求1所述的一种基于隐私数据保护的智能网联汽车数据训练方法,其特征在于:步骤1)中部署至车端的低层特征提取层有多种层数,每次部署时,将不同层数的低层特征提取层同时部署在车端;步骤5)在云端训练并更新多个与车端低层特征提取层对应的其他特征提取层,由此得到多个优化后的模型,将性能最佳的一个模型同步给车端。
  8. 根据权利要求1所述的一种基于隐私数据保护的智能网联汽车数据训练方法,其特征在于:步骤4)中用于算法迭代更新的模型更新数据集包括但不限于路试数据集、车端采集的原始数据集,其它利用数据增强而得到数据集也包含在内,包括但不限于对低层次特征集进行翻转、旋转、缩放操作而生成的数据。
  9. 一种基于隐私数据保护的智能网联汽车数据训练电子设备,其特征在于:包括存储器,配置为存储可执行指令;
    处理器,配置为执行存储器中存储的可执行指令,以实现权利要求1至8中任意一项所 述的一种基于隐私数据保护的智能网联汽车数据训练方法。
  10. 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于:所述计算机程序指令执行上述权利要求1至8中任意一项所述的一种基于隐私数据保护的智能网联汽车数据训练方法。
PCT/CN2023/072690 2022-04-28 2023-01-17 一种基于隐私数据保护的智能网联汽车数据训练方法、电子设备及计算机可读存储介质 WO2023207228A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP23794664.5A EP4332815A1 (en) 2022-04-28 2023-01-17 Intelligent connected vehicle data training method and electronic device based on privacy data protection, and computer readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210459457.3 2022-04-28
CN202210459457.3A CN114741732A (zh) 2022-04-28 2022-04-28 一种基于隐私数据保护的智能网联汽车数据训练方法、电子设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2023207228A1 true WO2023207228A1 (zh) 2023-11-02

Family

ID=82283889

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/072690 WO2023207228A1 (zh) 2022-04-28 2023-01-17 一种基于隐私数据保护的智能网联汽车数据训练方法、电子设备及计算机可读存储介质

Country Status (3)

Country Link
EP (1) EP4332815A1 (zh)
CN (1) CN114741732A (zh)
WO (1) WO2023207228A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114741732A (zh) * 2022-04-28 2022-07-12 重庆长安汽车股份有限公司 一种基于隐私数据保护的智能网联汽车数据训练方法、电子设备及计算机可读存储介质
CN114926154B (zh) * 2022-07-20 2022-11-18 江苏华存电子科技有限公司 一种多场景数据识别的保护切换方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875595A (zh) * 2018-05-29 2018-11-23 重庆大学 一种基于深度学习和多层特征融合的驾驶场景目标检测方法
CN109902798A (zh) * 2018-05-31 2019-06-18 华为技术有限公司 深度神经网络的训练方法和装置
CN112115975A (zh) * 2020-08-18 2020-12-22 山东信通电子股份有限公司 一种适用于监拍装置的深度学习网络模型快速迭代训练方法及设备
CN113329000A (zh) * 2021-05-17 2021-08-31 山东大学 一种基于智能家居环境下的隐私保护和安全监测一体化系统
CN114741732A (zh) * 2022-04-28 2022-07-12 重庆长安汽车股份有限公司 一种基于隐私数据保护的智能网联汽车数据训练方法、电子设备及计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875595A (zh) * 2018-05-29 2018-11-23 重庆大学 一种基于深度学习和多层特征融合的驾驶场景目标检测方法
CN109902798A (zh) * 2018-05-31 2019-06-18 华为技术有限公司 深度神经网络的训练方法和装置
CN112115975A (zh) * 2020-08-18 2020-12-22 山东信通电子股份有限公司 一种适用于监拍装置的深度学习网络模型快速迭代训练方法及设备
CN113329000A (zh) * 2021-05-17 2021-08-31 山东大学 一种基于智能家居环境下的隐私保护和安全监测一体化系统
CN114741732A (zh) * 2022-04-28 2022-07-12 重庆长安汽车股份有限公司 一种基于隐私数据保护的智能网联汽车数据训练方法、电子设备及计算机可读存储介质

Also Published As

Publication number Publication date
CN114741732A (zh) 2022-07-12
EP4332815A1 (en) 2024-03-06

Similar Documents

Publication Publication Date Title
WO2023207228A1 (zh) 一种基于隐私数据保护的智能网联汽车数据训练方法、电子设备及计算机可读存储介质
US11093481B2 (en) Systems and methods for electronic data distribution
WO2022116424A1 (zh) 交通流预测模型训练方法、装置、电子设备及存储介质
CN112163446B (zh) 一种障碍物检测方法、装置、电子设备及存储介质
CN110414526A (zh) 语义分割网络的训练方法、训练装置、服务器和存储介质
US20230419823A1 (en) Methods and systems for managing exhaust emission in a smart city based on industrial internet of things
Zhang et al. Urban traffic flow forecast based on FastGCRNN
CN114741731A (zh) 一种基于关键信息匿名化的智能网联汽车数据训练方法、电子设备及计算机可读存储介质
Lv et al. Digital twins based VR simulation for accident prevention of intelligent vehicle
CN114418021B (zh) 模型优化方法、装置及计算机程序产品
CN116597270A (zh) 基于注意力机制集成学习网络的道路损毁目标检测方法
An et al. Hintnet: Hierarchical knowledge transfer networks for traffic accident forecasting on heterogeneous spatio-temporal data
CN112785610B (zh) 一种融合低层特征的车道线语义分割方法
Wang et al. Abnormal trajectory detection based on geospatial consistent modeling
CN117011692A (zh) 一种道路识别的方法以及相关装置
CN115880580A (zh) 一种云层影响下的光学遥感影像道路信息智能提取方法
JP2023095812A (ja) 車載データ処理方法、装置、電子デバイス、記憶媒体、及びプログラム
Turk Artificial Intelligence and Urban Block—Building the Common Language
CN113961734A (zh) 基于停车数据和app操作日志的用户和车辆画像构建方法
CN113378157A (zh) 一种基于嵌入式软件二次开发的车联网数据采集系统
Tan et al. BSIRNet: A road extraction network with bidirectional spatial information reasoning
Qiu et al. Ontology-based map data quality assurance
CN117079142B (zh) 无人机自动巡检的反注意生成对抗道路中心线提取方法
Zhang et al. Super-Resolution Based and Topological Structure for Narrow Road Extraction from Remote Sensing Image
Liu et al. A Real-Time Detection Drone Algorithm Based on Instance Semantic Segmentation

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2023794664

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2023794664

Country of ref document: EP

Effective date: 20231130

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23794664

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: MX/A/2024/001537

Country of ref document: MX