CN112699906A

CN112699906A - Method, device and storage medium for acquiring training data

Info

Publication number: CN112699906A
Application number: CN201911007708.9A
Authority: CN
Inventors: 唐苗
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-10-22
Filing date: 2019-10-22
Publication date: 2021-04-23
Anticipated expiration: 2039-10-22
Also published as: CN112699906B

Abstract

The application discloses a method and a device for acquiring training data, and belongs to the technical field of deep learning. In the application, mixed data of each sample picture in a plurality of sample pictures is received, and the data format of the mixed data comprises a first field and a second field. And splitting the mixed data of each sample picture according to the data format of the mixed data to obtain a first field and a second field corresponding to each sample picture. And generating detection picture data of the corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generating classification picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture. Therefore, the detection picture data and the classification picture data can be generated according to the data format of the mixed data, the classification picture data does not need to be acquired from the server independently, transmission of redundant information is reduced, the invalid transmission rate of the data is reduced, and the bandwidth is saved.

Description

Method, device and storage medium for acquiring training data

Technical Field

The present application relates to the field of deep learning, and in particular, to a method, an apparatus, and a storage medium for acquiring training data.

Background

Currently, deep learning techniques are widely used in various industries. For example, picture recognition may be performed by a neural network model. Generally, before picture recognition by a neural network model, a large number of sample pictures and labels are required as training data to train the neural network model.

The training of the neural network model at present mainly comprises detection training and classification training. For convenience of description, data used for performing detection training is referred to as detection picture data, and data used for performing classification training is referred to as classification picture data. In the related art, the first device may generate the detection picture data according to the plurality of sample pictures, and upload the detection picture data to the server. Meanwhile, the first device may crop a target area included in each of the plurality of sample pictures to obtain a target area picture, and then generate classified picture data according to the target area picture and upload the classified picture data to the server. Subsequently, when the second device performs detection training on the neural network model, the detection picture data may be obtained from the server, and when the second device performs classification training on the neural network model, the classification picture data may be obtained from the server.

It can be seen that, in the related art, the picture data for performing the detection training and the picture data for performing the classification training are separately represented and transmitted, and the picture data for performing the detection training already contains part of the information of the training data for performing the classification training, which causes information redundancy, increases the invalid transmission rate of the data, and wastes bandwidth.

Disclosure of Invention

The embodiment of the application provides a method, a device and a storage medium for acquiring training data, which can be used for solving the problems of information redundancy, large invalid transmission rate of data and bandwidth waste when the training data is acquired in the related technology. The technical scheme is as follows:

in one aspect, a method for acquiring training data is provided, the method comprising:

receiving mixed data of each sample picture in a plurality of sample pictures, wherein the data format of the mixed data comprises a first field and a second field, the first field comprises a first picture parameter for generating detection picture data, the second field comprises a second picture parameter for generating classified picture data, the detection picture data refers to picture data for detection training, and the classified picture data refers to picture data for classification training;

splitting the mixed data of each sample picture according to the data format of the mixed data to obtain a first field and a second field corresponding to each sample picture;

and generating detection picture data of the corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generating classification picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture.

Optionally, the first picture parameter includes network storage information of a corresponding sample picture, location information of a detection target included in the corresponding sample picture, and tag category information, where the location information of the detection target refers to location information of a target area including the detection target in the corresponding sample picture, and the second picture parameter includes attribute information of the detection target.

Optionally, the generating the detection picture data according to the first picture parameter in the first field corresponding to each sample picture includes:

downloading corresponding sample pictures according to the network storage information of each sample picture;

storing the downloaded sample pictures, and acquiring a local storage address of each sample picture in the sample pictures;

and generating detection picture data of the corresponding sample picture according to the position information of the detection target in each sample picture, the label category information and the local storage address of the corresponding sample picture.

Optionally, the generating the classified picture data according to a second picture parameter in a second field corresponding to each sample picture includes:

cutting the corresponding sample picture according to the position information of the detection target in each sample picture to obtain a target area picture containing the detection target;

storing a plurality of target area pictures obtained by cutting, and acquiring a local storage address of each target area picture;

and generating classified picture data of the corresponding sample picture according to the attribute information of the detection target in each sample picture and the local storage address of the target area picture containing the corresponding detection target.

Optionally, the attribute information of the detection target includes the number of attributes corresponding to the detection target, an attribute number of each attribute, and a value number of an attribute value corresponding to each attribute.

Optionally, the method further comprises:

receiving a plurality of pieces of mixed label data corresponding to the plurality of sample pictures, wherein the data format of each piece of mixed label data comprises a third field and a fourth field, the third field comprises detection label data used for detection training, and the fourth field comprises classification label data used for classification training;

and splitting each mixed label data according to the data format of each mixed label data to obtain the detection label data and the classification label data in each mixed label data.

Optionally, the detection tag data includes a tag category and a category number corresponding to the tag category, and the classification tag data includes attribute information corresponding to the tag category.

In another aspect, an apparatus for acquiring training data is provided, the apparatus comprising:

the system comprises a first receiving module, a second receiving module and a processing module, wherein the first receiving module is used for receiving mixed data of each sample picture in a plurality of sample pictures, the data format of the mixed data comprises a first field and a second field, the first field comprises a first picture parameter used for generating detection picture data, the second field comprises a second picture parameter used for generating classified picture data, the detection picture data refers to picture data used for detection training, and the classified picture data refers to picture data used for classification training;

the first splitting module is used for splitting the mixed data of each sample picture according to the data format of the mixed data to obtain a first field and a second field corresponding to each sample picture;

and the generating module is used for generating detection picture data of the corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generating classification picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture.

Optionally, the generating module includes:

the downloading submodule is used for downloading the corresponding sample picture according to the network storage information of each sample picture;

the storage sub-module is used for storing the downloaded sample pictures and acquiring a local storage address of each sample picture in the sample pictures;

and the first generation submodule is used for generating detection image data of the corresponding sample image according to the position information of the detection target in each sample image, the label category information and the local storage address of the corresponding sample image.

Optionally, the generating module includes:

the cutting module is used for cutting the corresponding sample picture according to the position information of the detection target in each sample picture to obtain a target area picture containing the detection target;

the storage unit is used for storing the cut multiple target area pictures and acquiring a local storage address of each target area picture;

and the second generation unit is used for generating classified picture data of the corresponding sample picture according to the attribute information of the detection target in each sample picture and the local storage address of the target area picture containing the corresponding detection target.

Optionally, the apparatus further comprises:

a second receiving module, configured to receive multiple pieces of mixed label data corresponding to the multiple sample pictures, where a data format of each piece of mixed label data includes a third field and a fourth field, the third field includes detection label data used for performing detection training, and the fourth field includes classification label data used for performing classification training;

and the second splitting module is used for splitting each mixed label data according to the data format of each mixed label data to obtain the detection label data and the classification label data in each mixed label data.

In another aspect, an apparatus for acquiring training data is provided, the apparatus comprising a processor, a communication interface, a memory, and a communication bus;

the processor, the communication interface and the memory complete mutual communication through the communication bus;

the memory is used for storing computer programs;

the processor is used for executing the program stored on the memory so as to realize the method for providing the training data.

In another aspect, a computer-readable storage medium is provided, in which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method of acquiring training data as provided above.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

in the embodiment of the application, the mixed data of each sample picture in the multiple sample pictures is received, and the data format of the mixed data includes the first field and the second field, so that the mixed data of each sample picture can be split according to the data format of the mixed data, and the first field and the second field corresponding to each sample picture are obtained. And then generating detection picture data of the corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generating classification picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture. Therefore, in the embodiment of the application, the terminal can generate the detection picture data for performing detection training and the classification picture data for performing classification training according to the mixed data, and the classification picture data does not need to be acquired from the server independently, so that the transmission of redundant information is reduced, the invalid transmission rate of the data is reduced, and the bandwidth is saved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a system architecture diagram according to a method for acquiring training data provided by an embodiment of the present application;

FIG. 2 is a flowchart of a method for obtaining training data according to an embodiment of the present disclosure;

fig. 3 is a representation manner of mixed data of a sample picture provided by an embodiment of the present application;

FIG. 4 is a flowchart of generating inspection picture data and classification picture data according to an embodiment of the present disclosure;

FIG. 5 is a diagram illustrating detected picture data of a sample picture provided by an embodiment of the present application;

FIG. 6 is classified picture data of a detection target in a sample picture provided by an embodiment of the present application;

FIG. 7 is classified picture data of another detection target in a sample picture provided by an embodiment of the present application;

FIG. 8 is a representation of hybrid tag data provided by embodiments of the present application;

FIG. 9 is a representation of the detection tag data provided by embodiments of the present application;

FIG. 10 is a schematic structural diagram of an apparatus for acquiring training data according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of a terminal for acquiring training data according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Before explaining the embodiments of the present application in detail, an application scenario related to the embodiments of the present application will be described.

Currently, in the fields of intelligent transportation, security and the like, object detection is often required for a captured high-resolution image or video picture through a deep learning algorithm model such as a neural network model. For example, in the field of intelligent transportation, monitoring devices are usually arranged in many scenes such as a gate, a parking lot, and a road for image acquisition. After the image is acquired, the target such as a vehicle or a pedestrian included in the image can be detected and identified through the deep learning algorithm model, so that further processing can be performed according to the detection result. For another example, in the security field, a monitoring device may be provided in a residential area for image acquisition. After the image is acquired, a target such as a person or an object in the image can be detected and identified through a deep learning algorithm model so as to track. Before target detection and identification are carried out through the deep learning algorithm model, the deep learning algorithm model needs to be trained through picture data and label data of a sample picture. The method for acquiring the training data provided by the embodiment of the application can be applied to the process of training the deep learning algorithm model to acquire the training data required by training.

Next, a system architecture related to the task processing method provided by the embodiment of the present application is described.

Fig. 1 is a system architecture diagram according to a method for acquiring training data provided in an embodiment of the present application. As shown in fig. 1, the system may include a first device 101, a server 102, and a second device 103. Both the first device 101 and the second device 103 can communicate with the server 102.

It should be noted that the first device 101 may upload the hybrid data and the hybrid tag data in the embodiment of the present application to the server 102. The server 102, after receiving the mixed data and the mixed tag data, may store the mixed data and the mixed tag data, and transmit a storage address of the mixed data and a storage address of the mixed tag data to the second device 103.

The second device 103 runs a deep learning algorithm model, and the second device 103 may obtain the mixed data from the server 102 according to the storage address of the mixed data, and generate the detection picture data for performing detection training and the classification picture data for performing classification training according to the mixed data by the method provided in the embodiment of the present application, and obtain the mixed label data from the server 102 according to the storage address of the mixed label data, and generate the detection label data and the classification label data according to the mixed label data.

It should be noted that, in the embodiment of the present application, the first device 101 and the second device 103 may be both terminal devices such as a desktop computer, a notebook computer, and the like, or the second device 103 may also be an algorithm training server. The server 102 may be a server for storing training data, a server cluster composed of a plurality of servers for storing training data, or a cloud computing service center.

Next, a method for acquiring training data provided in the embodiment of the present application is described.

Fig. 2 is a flowchart of a method for acquiring training data according to an embodiment of the present disclosure. In the embodiment of the present application, an implementation process of the method is described with an execution subject of the method as a terminal, but this does not constitute a limitation to the embodiment of the present application, and as shown in fig. 2, the method includes the following steps:

step 201: mixed data of each sample picture in the plurality of sample pictures is received.

The data format of the mixed data comprises a first field and a second field, the first field comprises a first picture parameter used for generating detection picture data, the second field comprises a second picture parameter used for generating classified picture data, the detection picture data refers to picture data used for detection training, and the classified picture data refers to picture data used for classification training.

In the embodiment of the application, when the terminal needs to train deep learning algorithm models such as a neural network model, a data acquisition request can be sent to the server. When receiving the data acquisition request, the server may acquire the mixed data of each of the plurality of sample pictures according to the data acquisition request, and return all the acquired mixed data to the terminal, and the terminal may receive all the mixed data.

It should be noted that the server may store mixed data of each of the plurality of sample pictures, and the mixed data may be uploaded to the server in advance by another device. The devices may have address information for the server stored therein. Based on this, the devices can establish a connection with the server according to the address information of the server and transmit the mixed data to the server.

The data format of the mixed data may include a first field and a second field, the first field may include a first picture parameter for generating detection picture data, the second field may include a second picture parameter for generating classified picture data, the detection picture data refers to picture data used for performing detection training, and the classified picture data refers to picture data used for performing classification training.

It should be noted that the data format of the mixed data of each sample picture may include a first field and a second field, that is, the data format of the mixed data of each sample picture is the same. In addition, the second field may be located after the first field.

Step 202: and splitting the mixed data of each sample picture according to the data format of the mixed data to obtain a first field and a second field corresponding to each sample picture.

In this embodiment of the application, after the terminal receives the mixed data of each sample picture in the multiple sample pictures, because the data formats of the mixed data are the same, the terminal may split the mixed data of each sample picture according to the data format of the mixed data, so as to obtain the first field and the second field corresponding to each sample picture.

For example, a first field in the mixed data of each sample picture may have a first identifier indicating that the corresponding field is the first field in the mixed data, and a second field may have a second identifier indicating that the corresponding field is the second field in the mixed data.

Step 203: and generating detection picture data of the corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generating classification picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture.

As can be seen from the foregoing, the first field includes a first picture parameter for generating detected picture data, the second field includes a second picture parameter for generating classified picture data, and the terminal can generate detected picture data of a corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generate classified picture data of a corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture.

Optionally, the first picture parameter may include network storage information of the corresponding sample picture, location information of the detection target included in the corresponding sample picture, and tag category information, where the location information of the detection target refers to location information of a target area including the detection target in the corresponding sample picture, and the second picture parameter includes attribute information of the detection target.

It should be noted that the network storage information of the sample picture may be network storage address information of the sample picture, for example, the network storage information may be a URL (Uniform Resource Locator) corresponding to the sample picture. The position information of the detection target may refer to position information of a target area including the detection target in the corresponding sample picture. In this case, the position information of the detection target may include coordinates of four corner points of the target region in the sample picture. Alternatively, the position information of the detection target may include the center point coordinates of the target region and the length and width of the target region. Alternatively, when the target region including the detection target is an irregular polygon, the position information of the detection target may include coordinates of a plurality of corner points of the target region in the sample picture. The tag class information of the detection target may be a tag class to which the detection target belongs or a class number of the tag class to which the detection target belongs. For example, if the detection target is "car", the tag type information of the detection target may be the tag type of "car" or may be a type number corresponding to "car". The attribute information of the detection target may include the number of attributes corresponding to the detection target, an attribute number of each attribute, and a value number of a corresponding attribute value. For example, if the detection target is a "vehicle," the number of attributes included in the attribute information of the detection target is 2, that is, the detection target corresponds to two types of attributes. The attribute number of the first attribute of the two attributes is 1, and the corresponding value number is 0. The attribute number of the second attribute is 2, and the value number of the corresponding attribute value is 1.

Optionally, one or more detection targets may be included in each sample picture. When a plurality of detection targets are included, the mixed data of each sample picture may further include the number of the detection targets.

When a plurality of detection targets are included in the sample picture, the mixed data of the sample picture may be extended according to the data format of the mixed data. As such, the sample picture may include a plurality of first fields and a second field corresponding to each first field, the second field corresponding to each first field being adjacent to the respective first field. Each first field comprises a first picture parameter, the corresponding second field of each first field comprises a second picture parameter, and each detection target corresponds to one first picture parameter and one second picture parameter. The first picture parameter included in the first field may include network storage information, the number of detection targets, location information of the first detection target, and tag category information. And the first picture parameters included in the subsequent other first fields may not include network storage information and the number of detection targets, but include position information and tag category information of the corresponding detection targets. That is, the embodiment of the application provides an organization form of training data, which can be freely combined according to the number of different detection targets and the number of attributes, thereby improving the elasticity of the data.

Exemplarily, a representation of the mixed data of the sample picture is shown in fig. 3. As shown in fig. 3, the first field includes network storage information (first URL), the number of detection targets, location information of the first detection target, and tag category information, and the first second field is located after the first field and includes attribute information of the first detection target. Wherein, the number of the detection targets is 2, so that two detection targets in the sample picture can be determined. The position information of the first detection target is coordinates of N corner points of a target area including the first detection target. Wherein N is an integer greater than or equal to 3. And the label type information of the first detection object indicates that the label type of the first detection object is the vehicle. The number of attributes of the first detection target included in the first second field is 2, that is, the detection target corresponds to two attributes. The attribute number of the first attribute of the two attributes is 1, and the value number of the corresponding attribute value is 0. The attribute number of the second attribute is 2, and the value number of the corresponding attribute value is 1. Since there are two detection targets, there is also a second first field after the first second field, and the location information and the tag class information of the second detection target are included in the first field. The second first field is followed by a second field containing attribute information of a second detection target.

It should be noted that the above is a possible representation manner of the mixed data given in the embodiments of the present application. The front and back orders of the first picture parameter and the second picture parameter corresponding to the same detection target can be exchanged, that is, the first field and the first second field can be exchanged. Optionally, the order between the pieces of information included in the first field may also be adjusted accordingly, which is not limited in this embodiment of the application.

After the mixed data is acquired and the first field and the second field corresponding to each sample picture are obtained, referring to fig. 4, the terminal may generate the detected picture data of the corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generate the classified picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture.

2031: and downloading the corresponding sample picture according to the network storage information of each sample picture.

As can be seen from the foregoing description, the first picture parameter in the corresponding first field of each sample picture includes the network storage information of the sample picture. The network storage information may be a URL corresponding to the sample picture. Based on this, the terminal can download the sample picture from the corresponding picture server according to the URL corresponding to the sample picture.

2032: and storing the downloaded multiple sample pictures, and acquiring a local storage address of each sample picture in the multiple sample pictures.

After downloading each sample picture, the terminal may store the downloaded sample picture locally, record a local storage address of each sample picture, and use the local storage address as a local storage address of the corresponding sample picture.

2033: and generating detection picture data of the corresponding sample picture according to the position information of the detection target, the label category information and the local storage address of the corresponding sample picture, which are included in the picture data of each sample picture.

After each sample picture is downloaded to the local, for any sample picture, the terminal may acquire the position information and the tag category information of the detection target from the mixed data of the sample picture, and then combine the acquired position information and the tag category information of the detection target with the local storage address of the sample picture to obtain the detection picture data of the sample picture. For all sample pictures, the above method can be referred to generate the detection picture data of each sample picture, so as to obtain the detection picture data for performing detection training.

As can be seen from the above description, the position information and the tag type information of the detection target in the picture data of the sample picture are both located in the first field. Based on this, in the embodiment of the present application, if there is only one first field in the sample picture, the network storage information included in the first picture parameter in the first field may be replaced with a local storage address of the sample picture, that is, a local storage address of the sample picture. And taking the data after replacing the network storage information as the detection picture data of the sample picture. If the sample picture contains a plurality of first fields, the terminal can splice the first picture parameters in the plurality of first fields according to the sequence, and replace the network storage information contained in the first picture parameter with the local storage address of the sample picture, so as to obtain the detection picture data of the sample picture.

Still taking the mixed data of the sample picture shown in fig. 3 as an example, fig. 5 is a schematic diagram of the detected picture data of the sample picture generated by processing the mixed data shown in fig. 3 through this step. Referring to fig. 5, the first field and the second first field in fig. 3 are merged, and the network storage information, that is, the first URL, in the first picture parameter included in the first field is replaced with the local storage address a of the sample picture, so that the obtained detected picture data of the sample picture is obtained.

2034: and generating classified picture data according to the second picture parameters in the second field corresponding to each sample picture.

After the second field corresponding to each sample picture is obtained, the terminal can cut the corresponding sample picture according to the position information of the detection target included in the second picture parameter in the second field corresponding to each sample picture to obtain a target area picture containing the detection target; storing a plurality of target area pictures obtained by cutting, and acquiring a local storage address of each target area picture; and generating classified picture data of the corresponding sample pictures according to the attribute information of the detection target included in the picture data of each sample picture and the local storage address of the target area picture containing the corresponding detection target.

For example, for any sample picture, one or more detection targets may be included in the sample picture. Correspondingly, the sample picture is corresponding to one or more second fields, and the second picture parameter in each second field contains the attribute information of a detection target. For any detection target, the terminal may determine the position information of the target area where the detection target is located in the sample picture according to the position information of the detection target. Wherein the target area may be generally a regular rectangle, circle, etc. Then, the terminal may intercept the target region from the sample picture according to the position information of the target region, thereby obtaining a target region picture. And storing the target area picture locally, and recording a local storage address of the target area picture, namely the local storage address of the target area picture. And then, the terminal can splice the second picture parameter of the detection target and the local storage address of the cut target area picture, so as to obtain the classified picture data corresponding to the detection target. For each detection target in the sample picture, the terminal may refer to the above method to obtain the classified picture data of the corresponding detection target. And splicing the classified picture data of each detection target to obtain the classified picture data corresponding to the sample picture.

For each sample picture in the multiple sample pictures, the terminal can obtain the classification training data of the corresponding sample picture by referring to the method, and the classification training data of all the sample pictures is used as the classification picture data.

Still taking the mixed data of the sample picture shown in fig. 3 as an example, fig. 6 and 7 respectively show classified picture data of two detection targets in the sample picture generated by processing the mixed data shown in fig. 3 in this step. Illustratively, the sample picture shown in fig. 3 corresponds to two second fields, each of which contains one second picture parameter. And cutting the sample picture according to the position information of the detection target in the first picture parameter to obtain a first target area picture. Assuming that the local storage address of the first target area picture is address B, the storage address B is spliced with the first second picture parameter, so as to obtain the classified picture data of the first detection target as shown in fig. 6. And cutting the sample picture according to the position information of the detection target in the second first picture parameter to obtain a second target area picture. Assuming that the local storage address of the second target area picture is address C, the storage address C is spliced with the second picture parameter, so as to obtain the classified picture data of the second detection target as shown in fig. 7.

It should be noted that, when deep learning algorithm models such as a neural network model are trained, not only the mixed data of the sample pictures but also the mixed label data need to be acquired. Based on this, the terminal can also obtain a plurality of pieces of mixed label data corresponding to the plurality of sample pictures. The data format of each mixed label data comprises a third field and a fourth field, the third field comprises detection label data used for detection training, and the fourth field comprises classification label data used for classification training. In this case, the terminal may further split each of the plurality of pieces of mixed tag data to obtain the detection tag data and the classification tag data in each piece of mixed tag data.

In this embodiment of the application, the terminal may obtain the detection tag data in each piece of mixed tag data according to the third field of each piece of mixed tag data, and obtain the classification tag data in each piece of mixed tag data according to the fourth field of each piece of mixed tag data. The detection label data comprises label categories and category numbers corresponding to the label categories, and the classification label data comprises attribute information corresponding to the label categories.

It should be noted that the attribute information corresponding to the tag category may include the number of attributes, the name and number of each attribute, the number of attribute values corresponding to each attribute, the attribute value corresponding to each attribute, and the value number corresponding to the attribute value.

Illustratively, FIG. 8 shows one representation of hybrid tag data. As shown in fig. 8, the tag type is a car, the number corresponding to the tag type is 4, and the number of attributes in the attribute information corresponding to the tag type is 2, that is, the two attributes are corresponded. The attribute number of the first attribute is 0, and the attribute name is color. The number of attribute values corresponding to the attribute is 2, that is, there are two attribute values for the attribute of color. The first attribute value has a value number of 0, the corresponding attribute value is red, the second attribute value has a value number of 1, and the corresponding attribute value is white. The attribute number of the second attribute is 1, and the attribute name is a vehicle body. The attribute also contains two attribute values. The value number of the first attribute value is 0, and the corresponding attribute value is two compartments. The value number of the second attribute value is 1, and the corresponding attribute value is three boxes.

It is noted that, in the embodiment of the present application, one piece of hybrid tag data includes all information of one type of tag. In addition, the information belonging to the same attribute is continuous in the attribute information corresponding to one tag type. As shown in fig. 8, all information about the first attribute is adjacent to each other, and all information about the second attribute may be located after or before all information about the first attribute.

Note that the label data may be used to interpret the label category information and the attribute information in the aforementioned mixed data of each sample picture. For example, as shown in fig. 3, in the mixed data, the label type is a car, the label type corresponds to two types of attributes, the attribute with the attribute number of 1 is a color attribute, the attribute value number corresponding to the attribute is 0, and as can be seen from fig. 8, the attribute number of 1 and the value number of 0 indicate that the color attribute is red. Similarly, the attribute with the attribute number 2 is the vehicle body attribute, the attribute value number corresponding to the attribute is 1, and as can be seen from fig. 8, the attribute number 2 and the value number 1 indicate that the vehicle body attribute is a sedan.

In this embodiment of the application, after receiving a plurality of pieces of mixed tag data, the terminal may split each piece of tag data according to the third field and the fourth field according to the data format of the mixed tag data. Then, the terminal may use the third field in each piece of hybrid tag data obtained by splitting as the detection tag data.

For example, taking the mixed tag data shown in fig. 8 as an example, the mixed tag data is split to obtain a third field and a fourth field of the mixed tag data, as shown in fig. 9. The third field may be used as the detection tag data corresponding to the hybrid tag data. And splitting each mixed label data according to the splitting method, so as to obtain the detection label data corresponding to each mixed label data.

After each piece of label data is split according to the third field and the fourth field, the fourth field in each piece of label data obtained by splitting may be used as classification label data.

For example, for any piece of mixed tag data, the terminal may use information related to each attribute included in the attribute information of the piece of mixed tag data as one record, and store a plurality of records in association with tag categories corresponding to the piece of tag data, thereby obtaining classification tag data corresponding to the piece of mixed tag data. All mixed label data can be split and stored according to the method, so that classified label data corresponding to a plurality of mixed label data are obtained.

For example, taking the fourth field obtained by splitting in fig. 9 as an example, all the information of the first attribute and all the information of the second attribute in the fourth field are split to obtain two records, and the two records are stored in correspondence with the label type of the mixed label data, as shown in table 1, so as to obtain the classification label data corresponding to the mixed label data.

TABLE 1 Classification tag data

In the embodiment of the application, the terminal can transmit the detection label data for detection training and the classification label data for classification training together, so that the transmission of redundant information can be effectively reduced, the transmission rate of invalid data is reduced, meanwhile, the signaling interaction between the terminal and the server is reduced, and the system resources are saved.

Referring to fig. 10, an embodiment of the present application provides an apparatus 1000 for acquiring training data, where the apparatus 1000 includes:

a first receiving module 1001, configured to receive mixed data of each sample picture in multiple sample pictures, where a data format of the mixed data includes a first field and a second field, the first field includes a first picture parameter used for generating detected picture data, the second field includes a second picture parameter used for generating classified picture data, the detected picture data is picture data used for detection training, and the classified picture data is picture data used for classification training;

the first splitting module 1002 is configured to split the mixed data of each sample picture according to a data format of the mixed data, so as to obtain a first field and a second field corresponding to each sample picture;

the generating module 1003 is configured to generate detection picture data of a corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generate classification picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture.

Optionally, the first picture parameter includes network storage information of the corresponding sample picture, location information of the detection target included in the corresponding sample picture, and tag category information, the location information of the detection target refers to location information of a target area including the detection target in the corresponding sample picture, and the second picture parameter includes attribute information of the detection target.

Optionally, the generating module 1003 includes:

the storage submodule is used for storing the downloaded multiple sample pictures and acquiring a local storage address of each sample picture in the multiple sample pictures;

Optionally, the generating module 1003 includes:

the cutting sub-module is used for cutting the corresponding sample picture according to the position information of the detection target in each sample picture to obtain a target area picture containing the detection target;

the storage submodule is used for storing the cut multiple target area pictures and acquiring a local storage address of each target area picture;

and the second generation submodule is used for generating classified picture data of the corresponding sample picture according to the attribute information of the detection target in each sample picture and the local storage address of the target area picture containing the corresponding detection target.

Optionally, the apparatus further comprises:

the second receiving module is used for receiving a plurality of pieces of mixed label data corresponding to a plurality of sample pictures, the data format of each piece of mixed label data comprises a third field and a fourth field, the third field comprises detection label data used for detection training, and the fourth field comprises classification label data used for classification training;

In summary, in the embodiment of the present application, the mixed data of each sample picture in the multiple sample pictures is received, and the data format of the mixed data includes the first field and the second field, so that the mixed data of each sample picture can be split according to the data format of the mixed data, and the first field and the second field corresponding to each sample picture are obtained. And then generating detection picture data of the corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generating classification picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture. Therefore, in the embodiment of the application, the terminal can generate the detection picture data for performing detection training and the classification picture data for performing classification training according to the mixed data, and the classification picture data does not need to be acquired from the server independently, so that the transmission of redundant information is reduced, the invalid transmission rate of the data is reduced, and the bandwidth is saved.

It should be noted that: the apparatus for acquiring training data provided in the foregoing embodiment is only illustrated by dividing the functional modules when acquiring training data, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus for acquiring training data and the method for acquiring training data provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 11 shows a block diagram of a terminal device 1100 for performing string detection according to an exemplary embodiment of the present application. The terminal device 1100 may be: a smartphone, a tablet, a laptop, or a desktop computer. Terminal device 1100 may also be referred to by other names such as user device, portable device that adapts a neural network model, laptop device that adapts a neural network model, desktop device that adapts a neural network model, and so on.

In general, the terminal device 1100 includes: a processor 1101 and a memory 1102.

Processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1101 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1101 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1101 may be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, the processor 1101 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 can also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer-readable storage medium in memory 1102 is used to store at least one instruction for execution by processor 1101 to implement a string detection method provided by embodiments of the methods of the present application.

In some embodiments, the terminal device 1100 may further include: a peripheral interface 1103 and at least one peripheral. The processor 1101, memory 1102 and peripheral interface 1103 may be connected by a bus or signal lines. Various peripheral devices may be connected to the peripheral interface 1103 by buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1104, touch screen display 1105, camera 1106, audio circuitry 1107, positioning component 1108, and power supply 1109.

The peripheral interface 1103 may be used to connect at least one peripheral associated with I/O (Input/Output) to the processor 1101 and the memory 1102. In some embodiments, the processor 1101, memory 1102, and peripheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1101, the memory 1102 and the peripheral device interface 1103 can be implemented on separate chips or circuit boards, which is not limited by this embodiment.

The Radio Frequency circuit 1104 is used to receive and transmit RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1104 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1104 converts an electric signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electric signal. Optionally, the radio frequency circuit 1104 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1104 may communicate with other devices that adapt the neural network model through at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1104 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1105 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1105 is a touch display screen, the display screen 1105 also has the ability to capture touch signals on or over the surface of the display screen 1105. The touch signal may be input to the processor 1101 as a control signal for processing. At this point, the display screen 1105 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display 1105 may be a front panel disposed on terminal device 1100; in other embodiments, the display screens 1105 may be at least two, respectively disposed on different surfaces of the terminal device 1100 or in a folded design; in other embodiments, display 1105 may be a flexible display disposed on a curved surface or on a folded surface of terminal device 1100. Even further, the display screen 1105 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display screen 1105 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

Camera assembly 1106 is used to capture images or video. Optionally, camera assembly 1106 includes a front camera and a rear camera. Generally, a front camera is disposed on a front panel of a terminal device, and a rear camera is disposed on a rear surface of the terminal device. In some embodiments, the number of the rear cameras is at least two, and each of the rear cameras is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, the main camera and the wide-angle camera are fused to realize panoramic shooting and a VR (Virtual Reality) shooting function or other fusion shooting functions. In some embodiments, camera assembly 1106 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 1107 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1101 for processing or inputting the electric signals to the radio frequency circuit 1104 to achieve voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different positions of the terminal device 1100. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1101 or the radio frequency circuit 1104 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1107 may also include a headphone jack.

The positioning component 1108 is used to locate the current geographic position of the terminal device 1100 for purposes of navigation or LBS (Location Based Service). The Positioning component 1108 may be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.

Power supply 1109 is configured to provide power to various components within terminal device 1100. The power supply 1109 may be alternating current, direct current, disposable or rechargeable. When the power supply 1109 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal device 1100 also includes one or more sensors 1110. The one or more sensors 1110 include, but are not limited to: acceleration sensor 1111, gyro sensor 1112, pressure sensor 1113, fingerprint sensor 1114, optical sensor 1115, and proximity sensor 1116.

The acceleration sensor 1111 can detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal device 1100. For example, the acceleration sensor 1111 may be configured to detect components of the gravitational acceleration in three coordinate axes. The processor 1101 may control the touch display screen 1105 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1111. The acceleration sensor 1111 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 1112 may detect a body direction and a rotation angle of the terminal device 1100, and the gyro sensor 1112 may cooperate with the acceleration sensor 1111 to acquire a 3D motion of the user on the terminal device 1100. From the data collected by gyroscope sensor 1112, processor 1101 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 1113 may be disposed on the side bezel of terminal device 1100 and/or on the lower layers of touch display screen 1105. When the pressure sensor 1113 is disposed on the side frame of the terminal device 1100, the holding signal of the user to the terminal device 1100 can be detected, and the processor 1101 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1113. When the pressure sensor 1113 is disposed at the lower layer of the touch display screen 1105, the processor 1101 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 1105. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1114 is configured to collect a fingerprint of the user, and the processor 1101 identifies the user according to the fingerprint collected by the fingerprint sensor 1114, or the fingerprint sensor 1114 identifies the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the user is authorized by the processor 1101 to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 1114 may be disposed on the front, back, or side of the terminal device 1100. When a physical key or vendor Logo is provided on the terminal device 1100, the fingerprint sensor 1114 may be integrated with the physical key or vendor Logo.

Optical sensor 1115 is used to collect ambient light intensity. In one embodiment, the processor 1101 may control the display brightness of the touch display screen 1105 based on the ambient light intensity collected by the optical sensor 1115. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1105 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1105 is turned down. In another embodiment, processor 1101 may also dynamically adjust the shooting parameters of camera assembly 1106 based on the ambient light intensity collected by optical sensor 1115.

The proximity sensor 1116, also called a distance sensor, is usually provided on the front panel of the terminal device 1100. The proximity sensor 1116 is used to capture the distance between the user and the front face of the terminal device 1100. In one embodiment, the touch display screen 1105 is controlled by the processor 1101 to switch from a bright screen state to a dark screen state when the proximity sensor 1116 detects that the distance between the user and the front face of the terminal device 1100 is gradually decreasing; when the proximity sensor 1116 detects that the distance between the user and the front face of the terminal device 1100 becomes gradually larger, the touch display screen 1105 is controlled by the processor 1101 to switch from the breath-screen state to the bright-screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 11 does not constitute a limitation of terminal device 1100, and may include more or fewer components than those shown, or may combine certain components, or may employ a different arrangement of components.

In an exemplary embodiment of the present application, there is also provided a computer-readable storage medium, such as a memory, including instructions executable by a processor in the terminal device to perform the character string detection method in the above-described embodiment. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of acquiring training data, the method comprising:

2. The method according to claim 1, wherein the first picture parameter includes network storage information of a corresponding sample picture, location information of a detection target included in the corresponding sample picture, and tag type information, the location information of the detection target refers to location information of a target area including the detection target in the corresponding sample picture, and the second picture parameter includes attribute information of the detection target.

3. The method according to claim 2, wherein the generating the detection picture data according to the first picture parameter in the first field corresponding to each sample picture comprises:

4. The method according to claim 3, wherein the generating the classified picture data according to the second picture parameter in the second field corresponding to each sample picture comprises:

5. The method according to any one of claims 2 to 4, wherein the attribute information of the detection target includes the number of attributes corresponding to the detection target, an attribute number of each attribute, and a value number of an attribute value corresponding to each attribute.

6. The method of claim 1, further comprising:

7. The method of claim 6, wherein the detection tag data comprises a tag class and a class number corresponding to the tag class, and the classification tag data comprises attribute information corresponding to the tag class.

8. An apparatus for acquiring training data, the apparatus comprising:

9. The apparatus according to claim 8, wherein the first picture parameter includes network storage information of a corresponding sample picture, location information of a detection target included in the corresponding sample picture, and tag type information, the location information of the detection target refers to location information of a target area including the detection target in the corresponding sample picture, and the second picture parameter includes attribute information of the detection target.

10. The apparatus of claim 9, wherein the generating module comprises:

11. The apparatus of claim 10, wherein the generating module comprises:

12. The apparatus according to any one of claims 9 to 11, wherein the attribute information of the detection target includes the number of attributes corresponding to the detection target, an attribute number of each attribute, and a value number of an attribute value corresponding to each attribute.

13. The apparatus of claim 8, further comprising:

14. The apparatus of claim 13, wherein the detection tag data comprises a tag class and a class number corresponding to the tag class, and wherein the classification tag data comprises attribute information corresponding to the tag class.

15. A computer-readable storage medium, characterized in that the storage medium has stored therein a computer program which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.