CN114549951A

CN114549951A - Method for obtaining training data, related device, system and storage medium

Info

Publication number: CN114549951A
Application number: CN202011352731.4A
Authority: CN
Inventors: 陈鹏
Original assignee: Weilan Continental Beijing Technology Co ltd
Current assignee: Weilan Continental Beijing Technology Co ltd
Priority date: 2020-11-26
Filing date: 2020-11-26
Publication date: 2022-05-27
Anticipated expiration: 2040-11-26
Also published as: CN114549951B

Abstract

The embodiment of the application discloses a method for obtaining training data, related equipment and a storage medium, wherein the method for obtaining the training data applied to terminal equipment comprises the following steps: obtaining target training data, wherein the target training data comprises first information and second information, the first information represents private information, and the second information represents other information except the private information in the target training data; and extracting and sending the characteristics of the first information and the characteristics of the second information.

Description

Method for obtaining training data, related device, system and storage medium

Technical Field

The present application relates to the field of training data technology, and in particular, to a method for obtaining training data, a related device, a data processing system, and a storage medium.

Background

In the neural network model, training data is utilized to train the neural network model, and collected data such as images, audios and the like are directly input to the trained neural network model during application so as to realize application of the neural network model. It will be appreciated that the application of neural network models requires a large amount of training data, typically in the order of tens of thousands or more, to be accurate. Taking training data as an example, an image is usually obtained by shooting a scene of a scene, for example, taking a scene of a yard as an example, private information of a subject, such as a face of a person, a landmark building in the yard (the yard of which the yard is located by the landmark building), and the like may appear in the shot image. Under the condition that equipment for training the neural network model by using training data and equipment for acquiring the training data are not the same equipment, the acquisition equipment can send the self-acquired training data such as images to the training equipment, and the training equipment trains the neural network model by using the training data. Such a scheme of directly transmitting training data including private information of a subject to a training device may cause leakage of the private information, which is not favorable for protecting the private information.

Disclosure of Invention

In order to solve the existing technical problem, embodiments of the present application provide a method, related device, data processing system, and storage medium for obtaining training data.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a method for obtaining training data, which is applied to a server and comprises the following steps:

obtaining the characteristics of first information and the characteristics of second information, wherein the first information represents private information in target training data, and the second information represents other information except the private information in the target training data;

constructing target training data according to the characteristics of the first information and the characteristics of the second information;

and determining the constructed data as expected training data.

In the above scheme, the target training data is an image, the image includes a first sub-image and a second sub-image, the first sub-image is characterized as private information, and the second sub-image is information other than the information characterized as private information in the target training data;

correspondingly, the characteristics of first information and the characteristics of second information in the target training data are obtained, wherein the first information represents private information, and the second information represents other information except the private information in the target training data; constructing target training data according to the characteristics of the first information and the characteristics of the second information; determining the constructed data as expected training data, comprising:

receiving features of the first sub-image and features of the second sub-image; or extracting the features of the first sub-image and the features of the second sub-image;

constructing the image according to the characteristics of the first sub-image and the characteristics of the second sub-image;

and determining the constructed image as a desired image.

In the above solution, the constructing the image according to the feature of the first sub-image and the feature of the second sub-image includes:

inputting the characteristics of the first sub-image and the characteristics of the second sub-image into a trained generation model, and constructing an image by the generation model;

the generation model is trained under the condition that the loss function is lower than a threshold value, the generation model comprises at least two deconvolution layers, and the constructed image is obtained by carrying out deconvolution operation on the features of the first sub-image and the features of the second sub-image through the at least two deconvolution layers.

In the above scheme, the features of the first sub-image and the second sub-image are both shallow features of the image.

The embodiment of the application provides a method for obtaining training data, which is applied to terminal equipment and comprises the following steps:

obtaining target training data, wherein the target training data comprises first information and second information, the first information represents private information, and the second information represents other information except the private information in the target training data;

and extracting and sending the characteristics of the first information and the characteristics of the second information.

In the above scheme, the target training data is an image; correspondingly, the extracting the features of the first information and the features of the second information includes:

inputting the image to a feature extraction model; or inputting a first sub-image and a second sub-image in the image into a feature extraction model; the first sub-image is characterized as private information, and the second sub-image is other information except the information characterized as private information in the target training data; the feature extraction model comprises at least two convolution layers; and extracting shallow layer features of the image by using at least one convolution layer positioned at the front of the at least two convolution layers to obtain the features of the first sub-image and the features of the second sub-image.

An embodiment of the present application provides a server, including: the device comprises an obtaining unit, a constructing unit and a determining unit; wherein the content of the first and second substances,

the device comprises an obtaining unit and a judging unit, wherein the obtaining unit is used for obtaining the characteristics of first information and the characteristics of second information, the first information represents private information in target training data, and the second information represents other information except the private information in the target training data;

the construction unit is used for constructing the target training data according to the characteristics of the first information and the characteristics of the second information;

and the determining unit is used for determining the constructed data as expected training data.

An embodiment of the present application provides a terminal device, including: an obtaining unit, an extracting unit and a sending unit; wherein, the first and the second end of the pipe are connected with each other,

the device comprises an obtaining unit, a processing unit and a processing unit, wherein the obtaining unit is used for obtaining target training data, the target training data comprises first information and second information, the first information represents private information, and the second information represents other information except the private information in the target training data;

an extraction unit configured to extract a feature of the first information and a feature of the second information;

and a transmitting unit for transmitting the characteristics of the first information and the characteristics of the second information.

The embodiment of the application provides a data processing system, which comprises a terminal device and a server, wherein,

the terminal device is used for acquiring target training data and preprocessing the target training data: wherein the pre-treatment comprises at least: determining first information and second information in target training data; extracting the characteristics of the first information and the characteristics of the second information, and sending the characteristics of the first information and the characteristics of the second information to the server; the first information represents private information in the target training data, and the second information represents other information except the private information in the target training data;

and the server is used for constructing the target training data according to the acquired characteristics of the first information and the acquired characteristics of the second information.

Embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the aforementioned method.

Embodiments of the present application provide a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor executes the computer program to implement the steps of the foregoing method.

The embodiment of the application provides a method for obtaining training data, related equipment, a data processing system and a storage medium, wherein the method applied to terminal equipment comprises the following steps: obtaining target training data, wherein the target training data comprises first information and second information, the first information represents private information, and the second information represents other information except the private information in the target training data; and extracting and sending the characteristics of the first information and the characteristics of the second information. In the embodiment of the application, the terminal device sends the characteristics of the private information in the target training data to the server instead of the private information, so that the problem that the private information is leaked due to the fact that the part, which is represented as the private information, in the target training data is directly transmitted can be effectively solved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flow chart of an implementation of a method for obtaining training data, applied to a terminal device in an embodiment of the present application;

fig. 2 is a first flowchart illustrating an implementation of a method for obtaining training data applied to a server in an embodiment of the present application;

fig. 3 is a schematic flow chart of an implementation of the method for obtaining training data applied to the server in the embodiment of the present application;

fig. 4 is a schematic flow chart of an implementation of the method for obtaining training data applied to the server in the embodiment of the present application;

FIG. 5 is a first diagram illustrating target training data-original image in an embodiment of the present application;

FIG. 6 is a second diagram illustrating target training data-original image in the embodiment of the present application;

FIG. 7 is a schematic diagram of a feature extraction model in an embodiment of the present application;

FIG. 8 is a diagram illustrating an implementation of a method for obtaining training data in an embodiment of the present application;

fig. 9 is a schematic diagram of a terminal device in the embodiment of the present application;

FIG. 10 is a schematic diagram of a server according to an embodiment of the present application;

fig. 11 is a schematic hardware configuration diagram of a terminal device and/or a server in an embodiment of the present application;

fig. 12 is a schematic structural diagram of a data processing system according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

In the embodiment of the present application, the device for acquiring (target) training data may be any reasonable device, such as a terminal device of a lawn mower, a mobile phone, and the like. The training device may be any device that can train the neural network model using the training data, such as a server, a platform, and the like. The following description will take an example in which the device that collects training data is a terminal device and the training device is a server. The embodiment of the application provides a method for obtaining training data, which is applied to a terminal device and a server, and at least the problem that private information is leaked does not occur. It is to be understood that the target training data in the embodiments of the present application refers to training data including private information such as images and/or audio. It should be understood by those skilled in the art that in practical applications, due to different actual acquisition environments of training data, the acquired training data such as an image may include private information such as a human face, and may not include the human face, and the target training data in the embodiment of the present application refers to an image or audio that includes private information. Unless otherwise indicated, all are understood to be within the meaning of the term "about".

The embodiment of the present application provides a method for obtaining training data, which is applied to a terminal device, and as shown in fig. 1, the method includes:

s101: obtaining target training data;

in this step, the original training data is collected, and the target training data is screened from the original training data. And under the condition that the original training data is an image, shooting the image by using an image acquisition device of the terminal equipment, such as a camera. And under the condition that the original training data is audio, acquiring the audio by using an audio acquisition device of the terminal equipment, such as a microphone. The original training data in the embodiment of the present application refers to the acquired image or audio including the private information, and may also refer to the acquired image or audio not including the private information. For convenience of description, the target training data in the embodiment of the present application is a collected image or audio including private information. After the image or the audio is collected, for example, after a certain image or a certain piece of audio is collected, whether private information is included is judged, for example, whether a human face, a fingerprint and the like appear in the image, whether sensitive information such as a bank card, an identity card and the like appear in the audio, and if the judgment is yes, the image or the audio is taken as a target image or target audio (target training data).

It will be appreciated that the terminal device object privacy information may be any information reasonably characterized as object uniqueness. Taking the object as an example, the object private information may be a face of the user, a fingerprint of the user, a user identification card, a bank card number, a password, and the like. Taking an object as an example, the object may be a private landmark building (private landmark). The private information may also be information such as home address, user identification card, bank password, card number, etc. intermingled in the audio data. The leakage of private information from these objects may adversely affect the user, for example, the leakage of information from the face or fingerprint of the user may be falsely used by a lawbreaker to cause property loss. If a lawbreaker locates the position of the marked building according to the leaked marked building, the lawbreaker can destroy the marked building, and/or perform a puttee rope for the person who owns the marked building.

Here, the target training data is regarded as a set of two pieces of information (first information and second information), one of which (first information) represents secret information; the other information (second information) refers to other information than the private information in the target training data.

S102: extracting the characteristics of the first information and the characteristics of the second information;

s103: transmitting the characteristics of the first information and the characteristics of the second information;

in the embodiments of the present application, the characteristics of the information may be any reasonable characteristics. In the case where the target training data is an image, details, textures, contours, edges, colors, and the like of the image may be used as features of information (image features). In the case where the target training data is audio, the semantic meaning, frequency, tone, timbre, loudness, and the like of the audio data can be used as the features (audio features) of the information.

When the device for training the neural network model is a server, the terminal device may send the extracted features of the first information and the extracted features of the second information to the server, and the server performs training of the neural network model by using the received information, that is, the features of the two pieces of information.

In the embodiment of the application, the terminal device sends the information characteristic representing the private information in the target training data to the server, but not the private information in the target training data, so that the problem that the private information is leaked due to the fact that the part representing the private information in the target training data is directly transmitted can be effectively solved.

And under the condition that the target training data are images, the terminal equipment can extract the characteristic information through the trained characteristic extraction model. Specifically, a (target) image is input to a trained feature extraction model, or the features of the first information and the features of the second information are input to the feature extraction model; the feature extraction model comprises at least two convolution layers; and extracting the shallow feature of the image by using at least one convolution layer positioned at the front of the at least two convolution layers to obtain the feature of the first information and the feature of the second information. It is understood that in the case where multiple convolutional layers of the feature extraction model are connected in sequence, the convolutional layers located at the front are used to extract shallow features such as details, textures, contours, edges, colors, and the like; the latter convolutional layer is used to extract higher level information such as semantic, contextual aspects, etc. that is more abstract. The shallow feature of the image is used in the embodiment of the application, and the shallow feature of the image can be extracted, so that the resource processing load caused by extracting high-level information can be reduced, the method is easy to realize in engineering, and the feasibility is high. And the terminal equipment sends shallow characteristics of the two pieces of information to the server, so that the problem of leakage of the private information caused by directly sending target training data containing the private information can be avoided.

It can be understood that, because the target training data includes two parts, namely the first information and the second information in the embodiment of the present application, in the scheme of performing feature extraction, the target training data may be input into the feature extraction model, and the two parts in the target training data may also be input into the feature extraction model respectively. In the scheme of respectively inputting the information into the feature model, the target training data can be divided into two pieces of information, then the two divided pieces of information are input into the feature extraction model, and the feature extraction model can extract feature information of each part by utilizing the convolutional layer positioned at the front. In the scheme of inputting the target training data into the feature extraction model, the overall features of the target training data can be extracted, then which region in the target training data is the region where the information characterized as private is located is identified, the features extracted from the region are used as the information features characterized as private information, and the features extracted from the rest regions are used as the features of other information.

In the embodiment of the application, the characteristic of the information represented as private in the target training data sent to the server by the terminal device is not private information, so that the problem that the private information is leaked due to the fact that the training data including the private information is directly transmitted can be effectively avoided. In the embodiment of the application, the terminal equipment sends the information which is characterized as private information and the characteristics of other information to the server, which is equivalent to the characteristics of target training data. Therefore, the information transmission safety is guaranteed, and on the other hand, the server can be trained by using the target data collected by the terminal equipment.

The embodiment of the application provides a method for obtaining training data, which is applied to a server. As shown in fig. 2, the method includes:

s201: obtaining the characteristics of first information and the characteristics of second information, wherein the first information represents private information in target training data, and the second information represents other information except the private information in the target training data;

in this step, the target training data may be image and/or audio data. The server can receive information characteristics which are characterized as private information and characteristics of other information in the target training data sent by the terminal equipment. Alternatively, the server may capture an image including private information using its own image capture device, such as a camera. And under the condition that the target training data is audio, acquiring the audio containing the private information by using an audio acquisition device of the terminal equipment, such as a microphone. If an image or audio not including private information is also acquired in the process of acquiring the image or audio, the server is further required to judge whether private information is included after acquiring a certain image or a certain section of audio, such as whether a face, a fingerprint and the like appear in the image, whether sensitive information such as a bank card, an identity card and the like appears in the audio, and if so, the image or audio is taken as a target image or target audio (target training data). And then extracting the information characteristics which are characterized as private information in the target training data and the characteristics of the other information by the characteristic extraction model. For a specific scheme of feature extraction by the feature extraction model, reference is made to the foregoing related description, which is not repeated herein. In order to reduce the processing load for extracting features, the server may extract only shallow features as feature information.

S202: constructing target training data according to the characteristics of the first information and the characteristics of the second information;

in this step, data that is the same as or similar to the target training data may be constructed according to the features of the first and second pieces of information, and the data is regarded as constructed training data for easy understanding. It is to be understood that, in the case where the target training data is an image, the constructed training data is also an image. In the case where the target training data is audio, the constructed training data is also audio. And under the condition that the target training data is an image, the image comprises a first sub-image and a second sub-image, wherein the first sub-image is characterized as private information, and the second sub-image is other information except the information characterized as the private information in the target training data. Further, the server may input the features of the first sub-image and the features of the second sub-image into a trained generative model, and construct expected training data (such as an expected image) from the generative model; the generative model is trained under the condition that the loss function is lower than a threshold value, the generative model comprises at least two deconvolution layers, and the characteristics of the first information and the characteristics of the second information are subjected to deconvolution operation through the at least two deconvolution layers to obtain a constructed image. The constructed image may be considered an image generated by generating a model. For specific implementation, refer to the following detailed description.

S203: and determining the constructed data as expected training data.

In this step, the desired training data is regarded as training data (desired training data) for training the neural network model, which is available to the server.

The main body executing S201 to S203 is a server.

In the embodiment of the application, the target training data includes the first information characterized as the private information of the object and other information, and the other information can be information expected to be collected (private information does not exist). Thus, an accurate neural network model can be trained.

In some alternative embodiments, both the target training data and the desired training data may be images. In this case, the method for obtaining training data applied to the server in the embodiment of the present application includes: for convenience of description, an image in the case where the target training data is an image is referred to as an original image.

S301: obtaining the characteristics of a first sub-image and the characteristics of a second sub-image in the original image;

in this step, the server may receive information features represented as private information in the original image extracted and sent by the terminal device and features of the target image. The server receives shallow features of the image, such as details, textures, contours, edges, colors, etc. of the image. The server can also obtain the information characteristics represented as private information in the original image and the characteristics of the target image through a subsequent autonomous implementation scheme. The server receives the information characteristic which is represented as private in the original image sent by the terminal device, and the problem of information leakage caused by direct transmission of private information in the original image can be avoided.

S302: constructing an original image according to the characteristics of the first sub-image and the characteristics of the second sub-image;

in this step, the server may input the features of the image (first sub-image) characterized as private information and the features of the images characterized as other information (second sub-image) into a trained generative model, and construct a desired image from the generative model; wherein the generative model is trained if the loss function is below a threshold. Wherein, the constructed image-desired image may be an image whose image content and display effect, such as display resolution, are the same as the original image (i.e. the target training data); it is also possible to display images with the same image content but with a slightly inferior resolution without affecting the normal training of the server.

S303: and determining the constructed image as a desired image.

In this step, the image constructed by the generative model may be regarded as a desired image. The desired image is the desired training data obtained in accordance with the aforementioned S301 and S302 in the case where the target training data is an image.

According to the method and the device for constructing the original image, the original image is constructed according to the characteristics of the image which is characterized as the private information in the original image and the characteristics of other images, and the accuracy of construction of the original image can be guaranteed. Thus, an accurate neural network model can be trained.

The characteristic information according to which the server constructs the original image may be sent by the terminal device, and may also be obtained by itself through the scheme shown in fig. 4:

s401: obtaining an original image;

in this step, an image is photographed by an image acquisition device of the server, such as a camera. And judging whether the original image has information representing object privacy. That is, it is determined whether or not there is private information such as a face, a private landmark, or the like, which is not a person desired to be photographed, in the image, and if it is determined to be present, the image is regarded as the target training data — the original image.

S402: extracting the characteristics of the first information and the characteristics of the second information in the original image;

in this step, the feature extraction model is used to extract features of an image (a first sub-image) representing private information in the original image and to extract features of other images (a second sub-image) except for the image representing private information in the original image.

S403: constructing an original image according to the characteristics of the first sub-image and the characteristics of the second sub-image;

s404: and determining the constructed image as a desired image.

The main body executing S401 to S404 is a server. Wherein, the related description of S403 and S404 can also refer to the previous description of S302-S303.

In the scheme, the server is equivalent to realize the autonomous acquisition and judgment of the target training data and the autonomous extraction of the characteristic information, the transmission of the target training data comprising private information is not involved, and the information safety can be ensured. In addition, expected training data are constructed according to the feature information extracted autonomously, on one hand, the training data can be acquired autonomously by the server, and on the other hand, the construction accuracy can be guaranteed.

In terms of technical implementation, when the target training data is an image (referred to as an original image), the feature information extraction of the original image by the server may be realized by a feature extraction model. Specifically, the server inputs the original image or a first sub-image and a second sub-image in the original image into the feature extraction model; the feature extraction model comprises at least two convolution layers which are connected in sequence; and extracting the shallow feature of the image by using at least one convolution layer positioned at the front position in the at least two convolution layers. It is understood that among the plurality of convolutional layers of the feature extraction model, the convolutional layer positioned forward is used to extract shallow features such as detail, texture, contour, edge, color, and the like; the latter convolutional layer is used to extract higher level information such as semantic, contextual aspects, etc. that is more abstract. The image shallow feature is used in the embodiment of the application, the target training data can be constructed without high-level information, and the method is easy to implement in engineering and high in feasibility.

It can be understood that the server in the embodiment of the present application may implement autonomous acquisition of target training data, autonomous extraction of feature information, and construction of expected training data according to the feature information. If this process is referred to as an autonomic implementation process, the server may implement the construction of the desired training data through the autonomic implementation process. The construction of the desired training data can also be achieved by interaction with the terminal device. In the interaction scheme, the terminal device is used for providing the information characteristics which are characterized as private information and the characteristics of other information in the target training data for the server. In the embodiment of the application, the expected training data is preferably constructed through interaction between the terminal equipment and the server, and the characteristic information transmitted in the interaction process of the terminal equipment and the server is not private information, so that the private information can be prevented from being leaked, and the accuracy of the expected training data can be ensured.

It is understood that an image photographed in practical use is a photograph taken for a certain desired photographic subject. The original image of the present application may be regarded as an image captured for a desired subject, but an image that may be captured in an actual imaging background includes not only the desired subject but also an undesired subject. In the embodiment of the application, when a certain object to be shot is shot, an object which is not shot exists and the object which is not shot can represent the uniqueness of people or objects. As shown in fig. 5, the original image (target training data) is expected to be photographed with a owl, but when photographing the owl, the face (object such as user privacy information) of another user who watches the owl appears, and as shown in fig. 5, the face image, i.e., the first sub-image, is an image portion characterized as object privacy information in the original image. The other parts except the face, such as the owl and the shooting background part, are the second sub-image in the original image. In the image shown in fig. 6, a human face unexpectedly appears in the image in which grass was photographed. In practical applications, if the image including the face is directly transmitted, the face information of the user is leaked. From the user's perspective, it is undesirable for the image including his face to appear in a device other than his cell phone, to be used by others as training data.

The present application will be described in further detail with reference to the accompanying fig. 6-8 and the specific embodiments.

In the application scenario, a neural network model that obtains expected training data through interaction between the terminal device and the server and that the server performs training by using the expected training data is taken as an example of the recognition model. The image shown in fig. 6 is taken as an example of an original image, and as shown in fig. 6, the original image includes grass (growing grass and flowers), a shooting background portion, and an unexpectedly shot face portion. The face part is the part which is characterized as private information in the original image, and the grassland and the shooting background part can be regarded as target images in the original image.

In this application scenario, taking terminal equipment as a mower as an example, the mower has the following functions: the image acquisition can be carried out to the environment in the courtyard, utilizes the recognition model that the server trained, realizes automatically that it is the meadow region to what in the image of gathering, and what is not the meadow region discerns, then recognizes the position of meadow region in the courtyard, and the lawn mower advances to this position and weeds or mow. The recognition model is trained by using training data, and the training data is usually obtained by shooting images of the actual environment in a yard in multiple angles and multiple directions. It is desirable for a trained recognition model to have good properties: such as robustness and robustness, a large and rich amount of training data is required. The rich training data refers to that the grassland scenes represented by the collected images are richer (the shot contents are richer), for example, the shot images have grasslands and scenes such as open spaces, big trees, flower beds and the like around the grasslands. Besides grasslands, the shot images also comprise faces, pits, water accumulation and the like of users possibly appearing in yards. The recognition model is recognized by utilizing the abundant training data, so that a more accurate recognition model is necessarily trained. The recognition model is trained accurately, so that the recognition accuracy of grassland areas and non-grassland areas in yards can be guaranteed, and the operation of the mower is facilitated. In the present application scenario, it is described how to obtain training data that enables the recognition model to be trained more accurately.

In practical applications, when a user uses an image acquisition device of a lawn mower, such as a camera, to shoot an object to be shot, such as a lawn in a yard, a phenomenon that another user intrudes into the lawn during shooting may occur, which causes faces (private information) of another user to appear in a shot lawn image, as shown in an image shown in fig. 6. It can be understood that such image as training data can embody training data's authenticity to a certain extent (the image of shooing accords with actual conditions), but training data's variety is greatly enriched, and training data's authenticity and variety all can guarantee the training accuracy to the recognition model to a certain extent. If the lawn mower directly transmits grass images including the faces of the user to the server so that the server performs training of the recognition model using the images, there may be a case where face information is leaked. If the face information is intercepted and falsely used by a lawbreaker during the transmission process, property loss can be brought to the user. By utilizing the scheme in the embodiment of the application, the leakage of the face information can be avoided, the safety of private information is improved, and on the other hand, the normal training of the server on the recognition model is not influenced.

In technical implementation, the lawn mower collects the grassland environment of the currently located yard to obtain a collected image (original training data). And judging whether a human face exists in the acquired image, if the human face does not exist and the image is non-target training data, sending the image to a server to serve as a training image in a large amount of training data, or extracting the shallow image characteristics of the image by using a pre-trained characteristic extraction model. Preferably, a pre-trained feature extraction model is used for extracting shallow image features of the acquired image so as to keep the same processing flow with the acquired image with the human face.

And if the human face exists, the collected image of the existing human face is taken as target training data. Because the target training data is taken as an image in the application scene, the acquired image of the existing human face is taken as an original image. And extracting shallow image features of the original image by using a pre-trained feature extraction model. The feature extraction model may be any model capable of extracting the shallow features of the image in a neural network model, such as a Convolutional Neural Network (CNN) model. The CNN model may be trained before being used for shallow feature extraction of images. The specific training process is not elaborated. In the application scenario, the feature extraction model is preferably trained by using the acquired image which only includes grass but not human face. By adopting the image to train the feature extraction model, the robustness of the feature extraction model can be ensured, and the extraction accuracy of the shallow image features of the target image can be further ensured.

As shown in fig. 7, the CNN model includes an input layer, a convolutional layer, a pooling layer …, and an output layer. The convolutional layers and the pooling layers appear in pairs, the number of convolutional layers appearing in one CNN model is usually two or more, the convolutional layers are sequentially connected in the CNN model, and the specific number is determined according to the actual application condition. The role of the convolutional layer is to perform feature extraction on an image input to the convolutional layer. The pooling layer is used to reduce the dimensionality of the convolutional layer output due to the higher dimensionality of the image after feature extraction. The output layer is used for outputting. In the application scenario, the information input into the CNN model is taken as an original image, the original image collected by the mower is input into an input layer in the CNN model, the 1 st convolutional layer is used for extracting features of the original image, and the 1 st pooling layer is used for reducing dimensionality of the image output by the 1 st convolutional layer. The 2 nd convolutional layer is used for extracting the features of the image output by the 1 st pooling layer, the 2 nd pooling layer is used for reducing the reduction of the image output by the 2 nd convolutional layer, and the like.

In the process of researching the application of the CNN model, the inventor finds that several convolution layers positioned at the front in the CNN model can be used for extracting shallow features such as details, textures, edges, colors and the like from the image input by the input layer, and the shallow features can embody the details and the structure of the target image. Details and structure of the image may be constructed using the image. Several convolutional layers in the CNN model, which are located later, can be used to extract deep (abstract) features such as semantic meaning, context, etc. from the original image. Abstract features of an image may be used to identify content in the image, such as the image being a shot of grass, flowers, etc. In the application scenario, in order to enable the server to implement the construction of the original image according to the features of the two sub-images in the original image, the two sub-images in the original image may be distinguished first, for example, if an area where private information is located, such as a face area, is identified from the original image, an image in the area in the original image is regarded as a first sub-image, and other images except the first sub-image in the original image are regarded as second sub-images, and then the shallow features of the two sub-images are extracted. Taking the CNN model with 30 convolutional layers as an example, the first 3-5 convolutional layers are used to extract the shallow features of two sub-images, and the extracted shallow features of the two sub-images are sent to the server. If the set of the shallow features of the two sub-images is regarded as the shallow features of the original image, the extracted shallow features of the original image can be sent to the server once every time a grassland image including a human face is acquired. After extracting a certain number of shallow features of the original image, the number of shallow features of the original image may be sent to the server together. Preferably, in the scheme of collectively transmitting, when the number of the original images does not reach a certain number, shallow features of the original images subjected to shallow feature extraction are stored, and when the original images are stored to the certain number, shallow feature information of each of the stored original images is collectively transmitted to the server. Whether the face information is sent together or singly, the superficial layer feature information of the original image is sent, so that the problems that the face information is directly transmitted to cause private information leakage can be effectively solved.

The foregoing implementation process of the lawn mower can be seen in fig. 8, and the subsequent server process can also be understood with reference to the process shown in fig. 8. In fig. 8, the target training data refers to an image collected by the lawn mower and including private information in actual application, and the non-target training data refers to an image collected by the lawn mower and not including private information.

As shown in fig. 8, the server receives the shallow features of the first sub-image and the second sub-image in the original images collectively transmitted by the lawnmower, constructs the original images using the generation model trained in advance, and trains the recognition model using the constructed images as desired training data for training the recognition model.

The generated model in the application scenario may be a model trained in advance by the server. May be a model trained and trained using images that are actually taken of the grass environment and do not include faces. The Generative model may be any model in a neural Network model that can perform image regeneration according to feature information of an image, such as a Generative Adaptive Network (GAN) model. The GAN model includes two parts, an input layer, a generation network and a discrimination network. The application scene comprises a process of training the GAN and a process of constructing an original image by applying the trained GAN. The following are set forth separately:

in the scheme of training GAN, a loss function such as 2-norm loss function, maximum likelihood loss function or cross entropy loss function can be preset, an image including face information is collected in advance, the image can be an image in target training data or an additionally collected image, the image part of the region where the face is located in the image and the image part of the region except the face are subjected to feature extraction, the two parts of features are input into a generation network, the generation network comprises at least two deconvolution layers which are connected in sequence, the deconvolution layers can be subjected to deconvolution operation, the deconvolution operation of the deconvolution layers is equivalent to performing convolution operation on input information of the deconvolution layers such as feature information of two parts of images and performing up-sampling, so that an image generated by the generation network in the training process can be obtained, and the pixel value of the generated image and the pixel value of the pre-collected image are substituted into the preset loss function such as maximum likelihood loss function And obtaining a loss function value, and considering that the GAN model is trained when the loss function value is smaller than a preset threshold value, such as 0.1 or 0.09. The function expression of the maximum likelihood and cross entropy loss function is shown in the related description.

The scheme for constructing the original image by applying the trained GAN is as follows:

in the application scenario, the trained GAN input layer is used to receive the feature information of the first sub-image and the second sub-image and the original image (real image) captured really. The working principle of generating the network and judging the network is as follows: the generating network is used for generating the image according to the characteristic information of the first sub-image and the second sub-image. It is understood that the feature extraction model is a feature extraction model using a convolutional layer having a convolution kernel therein, and the convolution kernel performs a convolution operation on an image input to the convolutional layer, which is equivalent to implementing down-sampling of the image input to the convolutional layer. Correspondingly, the generation network comprises at least two deconvolution layers which are connected in sequence, each deconvolution layer is provided with a deconvolution kernel, and the deconvolution kernel carries out deconvolution operation on the image input to the deconvolution layer, and the deconvolution operation is equivalent to up-sampling the image input to the deconvolution layer. Two sub-images of the image to be constructed are subjected to deconvolution operation for at least two times, namely are subjected to up-sampling for at least two times, so that an up-sampled image can be obtained, and the up-sampled image can be regarded as an image generated or constructed by the generation network. The discrimination network compares the similarity of the real image and the image generated or constructed by the generation network so as to identify whether the image generated or constructed by the generation network is generated by the generation network or actually shot, if the discrimination network cannot identify whether the image generated by the generation network is generated by the generation network or actually shot, if the similarity between the image generated by the generation network and the original image meets a threshold value, for example, the similarity reaches 97 percent or more, the image generated by the generation network is qualified and can be used as expected training data of a subsequent identification model.

The qualified images generated by the generation network, that is, the images constructed by the server according to the features of the two sub-images of the original image, may be identical images or slightly different images. As seen from the user side, the display effect of the image constructed by the server in the display resolution is slightly inferior to the original image captured by the terminal device, but does not affect the viewing and recognition of the subject content in the constructed image by the user, nor the use of the constructed image by the recognition model.

The server generates the image by utilizing the image characteristics, so that the generation accuracy of the image can be ensured. The server takes the generated image as a desired image (desired training data), and trains the recognition model with the desired training data until the recognition model is trained well. The server generates the original image by utilizing the image part represented as the private information in the original image and the shallow characteristic information of the target image, namely, restores the image actually shot by the mower, the image shot by the mower usually conforms to the actual condition, and the image conforming to the actual condition is utilized to train the recognition model, so that the training accuracy of the recognition model can be greatly ensured.

In the case where the server has trained the recognition model, the lawnmower reads the trained recognition model. In the subsequent use process, namely under the condition that the mower is used for mowing, the mower can acquire images of grassland environments in yards, and the read identification model is used for automatically identifying which grassland areas and which non-grassland areas are in the acquired images. The location of the lawn area in the yard is identified, and the mower travels to the location to weed or mow the grass.

In the above-described embodiment, the lawn mower extracts shallow feature information of the first information and the second information in the original image, and the server constructs the original image according to the shallow feature information. If the image collected by the mower is the image without the private information, namely the non-target training data, the image can also be used as expected training data of the recognition model in order to achieve richness of the training data. Specifically, the lawn mower collects the grassland environment of the currently located yard to obtain a collected image. And judging whether a face exists in the acquired image, if not, sending the acquired image to a server as a training image in a large amount of training data, or extracting shallow image features of the acquired image by using a pre-trained feature extraction model. Similar to the scheme for extracting the features of the target training data, when the collected image is non-target training data, the shallow image features of the collected image are extracted by using a pre-trained feature extraction model, and the shallow image features are stored and are sent to the server when the collected image is stored to a certain amount. The server receives the shallow feature information of each collected image sent by the mower together, generates each image by using the shallow feature of the image by using a pre-trained GAN model, and takes the generated image as (expected) training data for training the recognition model. In this way, the server may use the constructed target training data (e.g., images containing faces) and the constructed non-target training data (e.g., images without faces) together as training data for training the recognition model. The recognition model is trained by utilizing abundant training data, so that the recognition model is trained more accurately. When the recognition model is applied subsequently, the areas needing to be mowed and the areas not needing to be mowed can be accurately recognized by the recognition model.

It is understood that the foregoing description is given by taking the target training data as an image as an example, and the general description that the target training data is audio is as follows: the terminal equipment such as a mobile phone collects the audio frequency, judges whether the collected audio frequency has information such as an identity card, a bank card number and the like, and if the collected audio frequency has the information, utilizes the feature extraction model to extract the features of the audio frequency part represented as private information and the rest audio frequency parts in the audio frequency data and sends the audio frequency part and the rest audio frequency parts to the server. And the server inputs the received characteristics of the audio part represented as the private information and the characteristics of the rest of the audio into the trained GAN model, and the GAN model generates the audio according to the characteristics of the audio part represented as the private information and the characteristics of the rest of the audio. It can be understood that the generation of the audio by the server is equivalent to the restoration of the audio data collected by the terminal device, so that the identification model is trained by using the restored audio data with the private information, and the training data is enriched and diversified. In the case where the training data is audio, in practical applications, the recognition model may be used to recognize an intent expressed by the audio data, such as, for example, a recognition intent to desire a cryptographic modification of a bank card having a bank card number XXX.

The scheme of the embodiment of the application has at least the following advantages:

1) the terminal equipment sends the characteristic information to the server, so that the problem that the private information is leaked due to the fact that target training data including the private information are directly sent can be effectively avoided;

2) the server constructs the target training data according to the information characteristics represented as private information in the target training data and the characteristic information of the other information, and can ensure the construction accuracy, thereby ensuring the training accuracy of the neural network model.

3) The training data used for training the neural network model is the training data without the private information, and the training data with the private information can make the training data richer and more diverse, so that the trained neural network model is more suitable for practical use.

An embodiment of the present application provides a terminal device, as shown in fig. 9, including: an obtaining unit 901, an extracting unit 902 and a sending unit 903; wherein the content of the first and second substances,

an obtaining unit 901, configured to obtain target training data, where the target training data includes first information and second information, where the first information represents private information, and the second information represents other information except the private information in the target training data;

an extracting unit 902, configured to extract a feature of the first information and a feature of the second information;

a sending unit 903, configured to send the characteristic of the first information and the characteristic of the second information.

In some alternatives, the target training data is an image; accordingly, an extracting unit 902 is configured to:

It can be understood that the obtaining Unit 901 and the extracting Unit 902 in the terminal device may be implemented by a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Micro Control Unit (MCU), or a Programmable Gate Array (FPGA) of the terminal device in practical applications. The sending unit 903 is implemented by a communication component.

An embodiment of the present application provides a server, as shown in fig. 10, including: an obtaining unit 101, a constructing unit 102 and a determining unit 103; wherein the content of the first and second substances,

an obtaining unit 101, configured to obtain a feature of first information and a feature of second information, where the first information represents private information in target training data, and the second information represents other information except the private information in the target training data;

the constructing unit 102 is configured to construct the target training data according to the feature of the first information and the feature of the second information;

a determining unit 103, configured to determine the constructed data as the expected training data.

In some optional schemes, the target training data is an image, the image includes a first sub-image and a second sub-image, where the first sub-image is characterized as private information, and the second sub-image is information other than information characterized as private information in the target training data;

correspondingly, the obtaining unit 101 is configured to:

the constructing unit 102 is configured to construct an image according to the feature of the first sub-image and the feature of the second sub-image;

a determining unit 103 for determining the constructed image as a desired image.

In some optional schemes, the constructing unit 102 is configured to input the features of the first sub-image and the features of the second sub-image into a trained generative model, and construct an image by using the generative model; the generation model is trained under the condition that the loss function is lower than a threshold value, the generation model comprises at least two deconvolution layers, and the constructed image is obtained by carrying out deconvolution operation on the features of the first sub-image and the features of the second sub-image through the at least two deconvolution layers.

In some alternatives, the features of the first sub-image and the second sub-image are both shallow features of the image.

It can be understood that the obtaining Unit 101, the constructing Unit 102, and the determining Unit 103 in the server can be implemented by a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Micro Control Unit (MCU), or a Programmable Gate Array (FPGA) in the server in practical applications.

It should be noted that, in the terminal device and the server according to the embodiments of the present application, because the principle of solving the problem of the terminal device and the server is similar to that of the method for obtaining the training data, the implementation process and the implementation principle of the terminal device can be described by referring to the implementation process and the implementation principle of the method for obtaining the training data, and repeated details are not repeated.

An embodiment of the present application further provides a data processing system, as shown in fig. 12, where the system includes a terminal device 01 and a server 02; wherein the content of the first and second substances,

the terminal device 01 is configured to obtain target training data, and preprocess the target training data: wherein the pre-treatment comprises at least: determining first information and second information in target training data; extracting the characteristics of the first information and the characteristics of the second information, and sending the characteristics of the first information and the characteristics of the second information to the server; the first information represents private information in the target training data, and the second information represents other information except the private information in the target training data;

and the server 02 is used for constructing the target training data according to the acquired characteristics of the first information and the acquired characteristics of the second information.

The data constructed by the server 02 can be used as ideal training data (expected training data) for training other models such as recognition models.

For a related description of the data processing system, please refer to the related description herein, and the repetition is not repeated.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, perform at least the steps of the method shown in any one of fig. 1 to 8. The computer readable storage medium may be specifically a memory. The memory may be memory 62 as shown in fig. 11.

Fig. 11 is a schematic diagram of a hardware structure of a terminal device and/or a server according to an embodiment of the present application, and as shown in fig. 11, the hardware structure includes: a communication component 63 for data transmission, at least one processor 61 and a memory 62 for storing computer programs capable of running on the processor 61. The various components in the terminal are coupled together by a bus system 64. It will be appreciated that the bus system 64 is used to enable communications among the components. The bus system 64 includes a power bus, a control bus, and a status signal bus in addition to the data bus. For clarity of illustration, however, the various buses are labeled as bus system 64 in FIG. 11.

Wherein the processor 61 executes the computer program to perform at least the steps of the method of any of fig. 1 to 8.

It will be appreciated that the memory 62 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRmb Access), and Random Access Memory (DRAM). The memory 62 described in embodiments herein is intended to comprise, without being limited to, these and any other suitable types of memory.

The method disclosed in the above embodiments of the present application may be applied to the processor 61, or implemented by the processor 61. The processor 61 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 61. The processor 61 described above may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 61 may implement or perform the methods, steps and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in the memory 62, and the processor 61 reads the information in the memory 62 and performs the steps of the aforementioned method in conjunction with its hardware.

In an exemplary embodiment, the terminal Device and/or the server may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), FPGAs, general purpose processors, controllers, MCUs, microprocessors (microprocessors), or other electronic components for performing the aforementioned method of obtaining training data.

In the several embodiments provided in the present application, it should be understood that the disclosed terminal device and method may be implemented in other manners. The above-described terminal device embodiments are only illustrative, for example, the division of the unit is only one logical function division, and there may be another division manner in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the terminal devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage terminal device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partially implemented in the form of a software product, which is stored in a storage medium and includes several instructions to enable a computer terminal device (which may be a personal computer, a server, or a network terminal device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a mobile storage terminal device, a ROM, a RAM, a magnetic disk or an optical disk, and various media capable of storing program codes.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.

Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.

The features disclosed in several of the method or terminal device embodiments provided by the present application may be combined arbitrarily without conflict to obtain a new method embodiment or terminal device embodiment.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for obtaining training data, applied to a server, is characterized by comprising the following steps:

and determining the constructed data as expected training data.

2. The method of claim 1, wherein the target training data is an image comprising a first sub-image and a second sub-image, wherein the first sub-image is characterized as private information and the second sub-image is information other than the information characterized as private information in the target training data;

and determining the constructed image as a desired image.

3. The method of claim 2, wherein constructing the image based on the features of the first sub-image and the features of the second sub-image comprises:

4. A method according to claim 2 or 3, wherein the features of the first and second sub-images are both shallow features of the image.

5. A method for obtaining training data, which is applied to a terminal device, is characterized by comprising the following steps:

6. The method of claim 5, wherein the target training data is an image; correspondingly, the extracting the features of the first information and the features of the second information includes:

7. A server, comprising: the device comprises an obtaining unit, a constructing unit and a determining unit; wherein the content of the first and second substances,

8. A terminal device, comprising: an obtaining unit, an extracting unit and a sending unit; wherein the content of the first and second substances,

9. A data processing system comprising a terminal device and a server, wherein,

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4 and/or the steps of the method of claim 5 or 6.

11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 4 and/or the steps of the method of claim 5 or 6 when executing the program.