CN112699906B

CN112699906B - Method, device and storage medium for acquiring training data

Info

Publication number: CN112699906B
Application number: CN201911007708.9A
Authority: CN
Inventors: 唐苗
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-10-22
Filing date: 2019-10-22
Publication date: 2023-09-22
Anticipated expiration: 2039-10-22
Also published as: CN112699906A

Abstract

The application discloses a method and a device for acquiring training data, and belongs to the technical field of deep learning. In the application, mixed data of each sample picture in a plurality of sample pictures is received, and the data format of the mixed data comprises a first field and a second field. Splitting the mixed data of each sample picture according to the data format of the mixed data to obtain a first field and a second field corresponding to each sample picture. And generating detection picture data of the corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generating classification picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture. Therefore, the application can generate the detection picture data and the classification picture data according to the data format of the mixed data, and does not need to independently acquire the classification picture data from the server, thereby reducing the transmission of redundant information, reducing the invalid transmission rate of the data and saving the bandwidth.

Description

Method, device and storage medium for acquiring training data

Technical Field

The present application relates to the field of deep learning, and in particular, to a method, an apparatus, and a storage medium for acquiring training data.

Background

Currently, deep learning techniques are widely used in various industries. For example, picture recognition may be performed by a neural network model. Typically, a large number of sample pictures and labels are required to train the neural network model as training data prior to picture recognition by the neural network model.

The training of the neural network model at present mainly comprises detection training and classification training. For convenience of the following description, data for performing detection training is referred to as detection picture data, and data for performing classification training is referred to as classification picture data. In the related art, the first device may generate detection picture data according to a plurality of sample pictures, and upload the detection picture data to the server. Meanwhile, the first device may cut a target area included in each sample picture in the plurality of sample pictures to obtain a target area picture, and then generate classified picture data according to the target area picture, and upload the classified picture data to the server. Subsequently, when the second device performs detection training on the neural network model, detection picture data can be obtained from the server, and when the second device performs classification training on the neural network model, classification picture data can be obtained from the server.

As can be seen, the picture data for performing the detection training and the picture data for performing the classification training in the related art are separately represented and transmitted, and the picture data for performing the detection training already contains part of the information of the training data for performing the classification training, which causes redundancy of information, increases the invalid transmission rate of the data, and wastes bandwidth.

Disclosure of Invention

The embodiment of the application provides a method, a device and a storage medium for acquiring training data, which can be used for solving the problems of information redundancy, large invalid transmission rate of data and waste of bandwidth when acquiring the training data in the related technology. The technical scheme is as follows:

in one aspect, a method of acquiring training data is provided, the method comprising:

receiving mixed data of each sample picture in a plurality of sample pictures, wherein the data format of the mixed data comprises a first field and a second field, the first field comprises a first picture parameter used for generating detection picture data, the second field comprises a second picture parameter used for generating classification picture data, the detection picture data is picture data used for carrying out detection training, and the classification picture data is picture data used for carrying out classification training;

Splitting the mixed data of each sample picture according to the data format of the mixed data to obtain a first field and a second field corresponding to each sample picture;

and generating detection picture data of the corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generating classification picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture.

Optionally, the first picture parameter includes network storage information of a corresponding sample picture, location information of a detection target included in the corresponding sample picture, and tag class information, where the location information of the detection target refers to location information of a target area including the detection target in the corresponding sample picture, and the second picture parameter includes attribute information of the detection target.

Optionally, the generating the detected picture data according to the first picture parameter in the first field corresponding to each sample picture includes:

downloading corresponding sample pictures according to network storage information of each sample picture;

storing the downloaded plurality of sample pictures, and acquiring a local storage address of each sample picture in the plurality of sample pictures;

And generating detection picture data of the corresponding sample picture according to the position information of the detection target in each sample picture, the label type information and the local storage address of the corresponding sample picture.

Optionally, the generating the classified picture data according to the second picture parameter in the second field corresponding to each sample picture includes:

cutting out corresponding sample pictures according to the position information of the detection target in each sample picture to obtain a target area picture containing the detection target;

storing the multiple target area pictures obtained by cutting, and obtaining the local storage address of each target area picture;

and generating classified picture data of the corresponding sample picture according to the attribute information of the detection target in each sample picture and the local storage address of the target area picture containing the corresponding detection target.

Optionally, the attribute information of the detection target includes the number of attributes corresponding to the detection target, an attribute number of each attribute, and a value number of an attribute value corresponding to each attribute.

Optionally, the method further comprises:

receiving a plurality of pieces of mixed label data corresponding to the plurality of sample pictures, wherein the data format of each piece of mixed label data comprises a third field and a fourth field, the third field comprises detection label data for detection training, and the fourth field comprises classification label data for classification training;

Splitting each piece of mixed label data according to the data format of each piece of mixed label data to obtain detection label data and classification label data in each piece of mixed label data.

Optionally, the detection tag data includes a tag class and a class number corresponding to the tag class, and the classification tag data includes attribute information corresponding to the tag class.

In another aspect, there is provided an apparatus for acquiring training data, the apparatus comprising:

the first receiving module is used for receiving mixed data of each sample picture in the plurality of sample pictures, the data format of the mixed data comprises a first field and a second field, the first field comprises a first picture parameter used for generating detection picture data, the second field comprises a second picture parameter used for generating classification picture data, the detection picture data is picture data used for carrying out detection training, and the classification picture data is picture data used for carrying out classification training;

the first splitting module is used for splitting the mixed data of each sample picture according to the data format of the mixed data to obtain a first field and a second field corresponding to each sample picture;

The generation module is used for generating detection picture data of the corresponding sample pictures according to the first picture parameters in the first fields corresponding to the sample pictures, and generating classification picture data of the corresponding sample pictures according to the second picture parameters in the second fields corresponding to the sample pictures.

Optionally, the generating module includes:

the downloading sub-module is used for downloading the corresponding sample pictures according to the network storage information of each sample picture;

the storage submodule is used for storing the downloaded multiple sample pictures and acquiring a local storage address of each sample picture in the multiple sample pictures;

the first generation sub-module is used for generating detection picture data of the corresponding sample picture according to the position information of the detection target in each sample picture, the label category information and the local storage address of the corresponding sample picture.

Optionally, the generating module includes:

the clipping module is used for clipping the corresponding sample pictures according to the position information of the detection targets in each sample picture to obtain target region pictures containing the detection targets;

the storage unit is used for storing the multiple target area pictures obtained through cutting and obtaining the local storage address of each target area picture;

and the second generation unit is used for generating the classified picture data of the corresponding sample picture according to the attribute information of the detection target in each sample picture and the local storage address of the target area picture containing the corresponding detection target.

Optionally, the apparatus further comprises:

the second receiving module is used for receiving a plurality of pieces of mixed label data corresponding to the plurality of sample pictures, the data format of each piece of mixed label data comprises a third field and a fourth field, the third field comprises detection label data used for detection training, and the fourth field comprises classification label data used for classification training;

The second splitting module is used for splitting each piece of mixed label data according to the data format of each piece of mixed label data to obtain detection label data and classification label data in each piece of mixed label data.

In another aspect, an apparatus for acquiring training data is provided, the apparatus comprising a processor, a communication interface, a memory, and a communication bus;

the processor, the communication interface and the memory complete communication with each other through the communication bus;

the memory is used for storing a computer program;

the processor is configured to execute the program stored in the memory, so as to implement the method for acquiring training data.

In another aspect, a computer readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, implements the steps of the method of acquiring training data provided above.

The technical scheme provided by the embodiment of the application has the beneficial effects that at least:

In the embodiment of the application, the mixed data of each sample picture in the plurality of sample pictures is received, and the data format of the mixed data comprises the first field and the second field, so that the mixed data of each sample picture can be split according to the data format of the mixed data to obtain the first field and the second field corresponding to each sample picture. And then generating detection picture data of the corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generating classification picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture. Therefore, the terminal in the embodiment of the application can generate the detection picture data for carrying out detection training and the classification picture data for carrying out classification training according to the mixed data, and does not need to independently acquire the classification picture data from the server, thereby reducing the transmission of redundant information, reducing the invalid transmission rate of the data and saving the bandwidth.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a system architecture diagram related to a method for acquiring training data according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for acquiring training data according to an embodiment of the present application;

FIG. 3 is a representation of mixed data of a sample picture according to an embodiment of the present application;

FIG. 4 is a flow chart of generating detected picture data and classified picture data provided by an embodiment of the present application;

fig. 5 is a schematic diagram of detected picture data of a sample picture according to an embodiment of the present application;

fig. 6 is classified picture data of a detection target in a sample picture according to an embodiment of the present application;

fig. 7 is classified picture data of another detection target in a sample picture according to an embodiment of the present application;

FIG. 8 is a representation of hybrid tag data provided by an embodiment of the present application;

FIG. 9 is a representation of detection tag data provided by an embodiment of the present application;

fig. 10 is a schematic structural diagram of an apparatus for acquiring training data according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a terminal for acquiring training data according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Before explaining the embodiment of the present application in detail, an application scenario related to the embodiment of the present application is described.

Currently, in the fields of intelligent transportation, security, and the like, it is often required to perform object detection on a captured high-resolution image or video picture by a deep learning algorithm model such as a neural network model. For example, in the field of intelligent transportation, monitoring devices are usually installed in many scenes such as a gate, a parking lot, and a road for image acquisition. After the image is acquired, targets such as vehicles or pedestrians and the like contained in the image can be detected and identified through a deep learning algorithm model, so that further processing can be performed according to the detection result. For another example, in the security field, monitoring devices may be provided in a residential area for image acquisition. After the image is acquired, targets such as people or objects in the image can be detected and identified through the deep learning algorithm model so as to be tracked. The deep learning algorithm model needs to be trained by the picture data and the tag data of the sample picture before the target detection and recognition by the deep learning algorithm model. The method for acquiring the training data provided by the embodiment of the application can be applied to the process of training the deep learning algorithm model so as to acquire the training data required by training.

Next, a system architecture related to the task processing method provided by the embodiment of the present application is described.

Fig. 1 is a system architecture diagram related to a method for acquiring training data according to an embodiment of the present application. As shown in fig. 1, a first device 101, a server 102, and a second device 103 may be included in the system. Wherein both the first device 101 and the second device 103 may communicate with the server 102.

It should be noted that, the first device 101 may upload the mixed data and the mixed tag data in the embodiment of the present application to the server 102. After receiving the mix data and the mix tag data, the server 102 may store the mix data and the mix tag data and transmit a storage address of the mix data and a storage address of the mix tag data to the second device 103.

The second device 103 is operated with a deep learning algorithm model, and the second device 103 may obtain the mixed data from the server 102 according to the storage address of the mixed data, generate the detection picture data for performing detection training and the classification picture data for performing classification training according to the mixed data, obtain the mixed label data from the server 102 according to the storage address of the mixed label data, and generate the detection label data and the classification label data according to the mixed label data by using the method provided by the embodiment of the present application.

It should be noted that, in the embodiment of the present application, the first device 101 and the second device 103 may be terminal devices such as a desktop computer and a notebook computer, or the second device 103 may be an algorithm training server. The server 102 may be a server for storing training data, a server cluster comprising a plurality of servers for storing training data, or a cloud computing service center.

The method for acquiring training data provided by the embodiment of the application is described next.

Fig. 2 is a flowchart of a method for acquiring training data according to an embodiment of the present application. In the embodiment of the present application, the implementation process of the method is described by taking the execution body of the method as a terminal, but this does not limit the embodiment of the present application, as shown in fig. 2, the method includes the following steps:

step 201: and receiving mixed data of each sample picture in the plurality of sample pictures.

The data format of the mixed data comprises a first field and a second field, wherein the first field comprises a first picture parameter used for generating detection picture data, the second field comprises a second picture parameter used for generating classification picture data, the detection picture data is picture data used for detection training, and the classification picture data is picture data used for classification training.

In the embodiment of the application, when the terminal needs to train the deep learning algorithm model such as the neural network model, a data acquisition request can be sent to the server. When the server receives the data acquisition request, the server can acquire the mixed data of each sample picture in the plurality of sample pictures according to the data acquisition request, and return all acquired mixed data to the terminal, and the terminal can receive all mixed data.

It should be noted that, the server may store the mixed data of each sample picture in the plurality of sample pictures, and the mixed data may be uploaded to the server in advance by other devices. The devices may have stored therein address information for the server. Based on this, the devices can establish a connection with the server according to the address information of the server, and transmit the mixed data to the server.

The data format of the hybrid data may include a first field and a second field, the first field may include a first picture parameter for generating detection picture data, the second field may include a second picture parameter for generating classification picture data, the detection picture data refers to picture data for performing detection training, and the classification picture data refers to picture data for performing classification training.

It should be noted that the data format of the mixed data of each sample picture may include a first field and a second field, that is, the data format of the mixed data of each sample picture is the same. In addition, the second field may be located after the first field.

Step 202: splitting the mixed data of each sample picture according to the data format of the mixed data to obtain a first field and a second field corresponding to each sample picture.

In the embodiment of the application, after the terminal receives the mixed data of each sample picture in the plurality of sample pictures, the terminal can split the mixed data of each sample picture according to the data format of the mixed data to obtain the first field and the second field corresponding to each sample picture because the data format of the mixed data is the same.

For example, the first field in the mixed data of each sample picture may have a first identification, and the second field may have a second identification, where the first identification is used to indicate that the corresponding field is the first field in the mixed data, and the second identification is used to indicate that the corresponding field is the second field in the mixed data.

Step 203: and generating detection picture data of the corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generating classification picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture.

As can be seen from the foregoing, the first field includes a first picture parameter for generating detection picture data, the second field includes a second picture parameter for generating classification picture data, the terminal may generate detection picture data of a corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generate classification picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture.

Optionally, the first picture parameter may include network storage information of a corresponding sample picture, location information of a detection target included in the corresponding sample picture, and tag type information, where the location information of the detection target refers to location information of a target area including the detection target in the corresponding sample picture, and the second picture parameter includes attribute information of the detection target.

It should be noted that, the network storage information of the sample picture may be network storage address information of the sample picture, for example, the network storage information may be URL (Uniform Resource Locator ) corresponding to the sample picture. The position information of the detection target may refer to position information of a target area including the detection target in a corresponding sample picture. The target area including the detection target may be a rectangular area, and in this case, the position information of the detection target may include coordinates of four corner points of the target area in the sample picture. Alternatively, the position information of the detection target may include the center point coordinates of the target area and the length and width of the target area. Alternatively, when the target area including the detection target is an irregular polygon, the position information of the detection target may include coordinates of a plurality of corner points of the target area in the sample picture. The tag class information of the detection target may be a tag class to which the detection target belongs, or a class number of the tag class to which the detection target belongs. For example, if the detection target is "car", the tag class information of the detection target may be the tag class of "car", or may be the class number corresponding to "car". The attribute information of the detection target may include the number of attributes corresponding to the detection target, the attribute number of each attribute, and the value number of the corresponding attribute value. For example, assuming that the detection target is "car", the attribute information of the detection target includes the number of attributes of 2, that is, the detection target corresponds to two kinds of attributes. Wherein, the attribute number of the first attribute in the two attributes is 1, and the corresponding value number is 0. The attribute number of the second attribute is 2, and the value number of the corresponding attribute value is 1.

Optionally, one or more detection targets may be included in each sample picture. When a plurality of detection targets are included, the mixed data of each sample picture can also include the number of the detection targets.

It should be noted that, when the sample picture includes a plurality of detection targets, the mixed data of the sample picture may be expanded according to the data format of the mixed data. As such, the sample picture may include a plurality of first fields and a second field corresponding to each first field, the second field corresponding to each first field being adjacent to the respective first field. Each first field comprises a first picture parameter, and each second field corresponding to each first field comprises a second picture parameter, and each detection target corresponds to one first picture parameter and one second picture parameter. The first picture parameter included in the first field may include network storage information, the number of detection targets, location information of the first detection target, and tag class information. The first picture parameter included in the subsequent other first field may not include network storage information and the number of detection targets, but include location information and tag class information of the detection targets accordingly. That is, the embodiment of the application provides an organization form of training data, which can be freely combined according to different detection target quantity and attribute quantity, so that the data elasticity is improved.

By way of example, a representation of the mix data of a sample picture is shown in fig. 3. As shown in fig. 3, the first field includes network storage information (first URL), the number of detection targets, location information of the first detection targets, and tag class information, the first second field is located after the first field, and the first second field includes attribute information of the first detection targets. Wherein, two detection targets in the sample picture can be determined by the number of detection targets being 2. The position information of the first detection target is coordinates of N corner points of a target area including the first detection target. Wherein N is an integer greater than or equal to 3. The tag class information of the first detection target can be used for knowing that the tag class to which the first detection target belongs is a vehicle. The first and second fields include the first detection target with a number of attributes of 2, that is, the detection target corresponds to two attributes. Wherein, the attribute number of the first attribute in the two attributes is 1, and the value number of the corresponding attribute value is 0. The attribute number of the second attribute is 2, and the value number of the corresponding attribute value is 1. Since there are two detection targets, there is also a second first field after the first second field, and the position information and tag class information of the second detection target are included in the first field. Located after the second first field is a second field containing attribute information of a second detection target.

It should be noted that the foregoing is a representation of one possible mix of data provided by embodiments of the present application. The first picture parameter and the second picture parameter corresponding to the same detection target may be exchanged in order, that is, the first field and the first second field may be exchanged. Optionally, the order of the information included in the first field may be adjusted accordingly, which is not limited by the embodiment of the present application.

After the mixed data is obtained and the first field and the second field corresponding to each sample picture are obtained, referring to fig. 4, the terminal may generate the detected picture data of the corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generate the classified picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture.

2031: and downloading the corresponding sample pictures according to the network storage information of each sample picture.

As can be seen from the foregoing description, the first picture parameters in the corresponding first field of each sample picture include network storage information of the sample picture. The network storage information may be a URL corresponding to the sample picture. Based on the above, the terminal can download the sample picture from the corresponding picture server according to the URL corresponding to the sample picture.

2032: and storing the downloaded multiple sample pictures, and acquiring a local storage address of each sample picture in the multiple sample pictures.

After downloading each sample picture, the terminal can store the downloaded sample picture locally, record the local storage address of each sample picture, and take the storage address as the local storage address of the corresponding sample picture.

2033: and generating detection picture data of the corresponding sample picture according to the position information of the detection target, the label type information and the local storage address of the corresponding sample picture, which are included in the picture data of each sample picture.

After each sample picture is downloaded to the local, for any sample picture, the terminal can acquire the position information and the label type information of the detection target from the mixed data of the sample picture, and then, the acquired position information and label type information of the detection target are combined with the local storage address of the sample picture to obtain the detection picture data of the sample picture. For all sample pictures, the detection picture data of each sample picture can be generated by referring to the method, so that the detection picture data for detection training is obtained.

It should be noted that, as is clear from the foregoing description, the position information and the tag type information of the detection target in the picture data of the sample picture are both located in the first field. Based on this, in the embodiment of the present application, if there is only one first field in the sample picture, the network storage information contained in the first picture parameter in the first field may be replaced with the local storage address of the sample picture, that is, the local storage address of the sample picture. And taking the data after the network storage information is replaced as the detection picture data of the sample picture. If the sample picture contains a plurality of first fields, the terminal can splice the first picture parameters in the plurality of first fields according to the sequence, and replace network storage information contained in the first picture parameters with a local storage address of the sample picture, so as to obtain detection picture data of the sample picture.

Taking the mixed data of the sample picture shown in fig. 3 as an example, fig. 5 is a schematic diagram of the detected picture data of the sample picture generated by processing the mixed data shown in fig. 3 through this step. Referring to fig. 5, the first field and the second first field in fig. 3 are combined, and network storage information in the first picture parameter included in the first field, that is, the first URL is replaced by a local storage address a of the sample picture, so as to obtain detected picture data of the sample picture.

2034: and generating classified picture data according to the second picture parameters in the second field corresponding to each sample picture.

After obtaining the second field corresponding to each sample picture, the terminal can cut the corresponding sample picture according to the position information of the detection target included in the second picture parameter in the second field corresponding to each sample picture to obtain a target area picture containing the detection target; storing the multiple target area pictures obtained by cutting, and obtaining the local storage address of each target area picture; and generating classified picture data of the corresponding sample picture according to the attribute information of the detection target included in the picture data of each sample picture and the local storage address of the target area picture containing the corresponding detection target.

For example, for any one sample picture, one or more detection targets may be included in the sample picture. Correspondingly, the sample picture corresponds to one or more second fields, and the second picture parameter in each second field contains attribute information of a detection target. For any detection target, the terminal can determine the position information of the target area where the detection target is located in the sample picture according to the position information of the detection target. Wherein the target area may be generally regular rectangular, circular, etc. And then, the terminal can intercept the target area from the sample image according to the position information of the target area, so as to obtain a target area image. And storing the target area picture locally, and recording the local storage address of the target area picture, namely the local storage address of the target area picture. And then, the terminal can splice the second picture parameter of the detection target and the local storage address of the cut target region picture, so as to obtain the classified picture data corresponding to the detection target. For each detection target in the sample picture, the terminal can refer to the method to obtain the classification picture data of the corresponding detection target. And splicing the classified picture data of each detection target to obtain the classified picture data corresponding to the sample picture.

For each sample picture in the plurality of sample pictures, the terminal can refer to the method to obtain the classification training data of the corresponding sample picture, and the classification training data of all sample pictures are used as classification picture data.

Taking still the mixed data of the sample picture shown in fig. 3 as an example, fig. 6 and 7 respectively show the classified picture data of two detection targets in the sample picture generated by processing the mixed data shown in fig. 3 through this step. Illustratively, the sample picture shown in fig. 3 corresponds to two second fields, and each second field contains one second picture parameter. And cutting the sample picture according to the position information of the detection target in the first picture parameter to obtain a first target area picture. Assuming that the local storage address of the first target area picture is address B, the storage address B is spliced with the first second picture parameter, so that classified picture data of the first detection target as shown in fig. 6 can be obtained. And cutting the sample picture according to the position information of the detection target in the second first picture parameter to obtain a second target region picture. Assuming that the local storage address of the second target area picture is the address C, the storage address C is spliced with the second picture parameter, so that classified picture data of the second detection target as shown in fig. 7 can be obtained.

When training a deep learning algorithm model such as a neural network model, it is necessary to acquire not only hybrid data of a sample image but also hybrid tag data. Based on the above, the terminal may further obtain a plurality of pieces of hybrid tag data corresponding to the plurality of sample pictures. The data format of each piece of hybrid tag data includes a third field including detection tag data for performing detection training and a fourth field including classification tag data for performing classification training. In this case, the terminal may further split each piece of hybrid tag data in the plurality of pieces of hybrid tag data to obtain detection tag data and classification tag data in each piece of hybrid tag data.

In the embodiment of the application, the terminal can obtain the detection tag data in each piece of mixed tag data according to the third field of each piece of mixed tag data, and obtain the classification tag data in each piece of mixed tag data according to the fourth field of each piece of mixed tag data. The detection tag data comprises a tag class and a class number corresponding to the tag class, and the classification tag data comprises attribute information corresponding to the tag class.

It should be noted that, the attribute information corresponding to the tag class may include the number of attributes, the attribute name and number of each attribute, the number of attribute values corresponding to each attribute, the attribute value corresponding to each attribute, and the value number corresponding to the attribute value.

Illustratively, FIG. 8 shows one representation of hybrid tag data. As shown in fig. 8, the tag class is a car, the tag class corresponds to the number 4, and the number of attributes in the attribute information corresponding to the tag class is 2, that is, two attributes are corresponding. The attribute number of the first attribute is 0, and the attribute name is color. The number of attribute values corresponding to the attribute is 2, i.e. there are two attribute values for this attribute, color. The first attribute value has a value number of 0, the corresponding attribute value is red, the second attribute value has a value number of 1, and the corresponding attribute value is white. The attribute number of the second attribute is 1, and the attribute name is a vehicle body. The attribute also includes two attribute values. The first attribute value has a value number of 0 and the corresponding attribute value is two boxes. The second attribute value has a value number of 1 and the corresponding attribute value is three.

It is noted that in the embodiment of the present application, a piece of mixed tag data includes all information of one tag. In addition, information belonging to the same attribute among attribute information corresponding to one tag category is continuous. As shown in fig. 8, all information about the first attribute is adjacent to each other, and all information about the second attribute may be located after or before all information about the first attribute.

It should be noted that the tag data may be used to explain tag category information and attribute information in the mixed data of each sample picture. For example, as shown in fig. 3, the label type in the mixed data is a car, the label type corresponds to two kinds of attributes, the attribute with the attribute number of 1 is a color attribute, the attribute value corresponding to the attribute is 0, and as shown in fig. 8, the attribute number of 1 is 0, and the value of 0 indicates that the color attribute is red. Similarly, the attribute with the attribute number 2 is a car body attribute, and the attribute value corresponding to the attribute is 1, and as can be seen from fig. 8, the attribute number 2 and the value 1 indicate that the car body attribute is three.

In the embodiment of the application, after receiving a plurality of pieces of mixed tag data, the terminal can split each piece of tag data according to the third field and the fourth field according to the data format of the mixed tag data. Then, the terminal can take the third field in each piece of split mixed tag data as detection tag data.

For example, taking the mixed tag data shown in fig. 8 as an example, splitting the mixed tag data to obtain a third field and a fourth field of the mixed tag data, as shown in fig. 9. The third field may be used as detection tag data corresponding to the hybrid tag data. And splitting each piece of mixed label data according to the mixed label data to obtain detection label data corresponding to each piece of mixed label data.

After each piece of tag data is split according to the third field and the fourth field, the fourth field in each piece of tag data obtained by splitting can be used as classified tag data.

For example, for any piece of mixed tag data, the terminal may store, as one record, information related to each attribute included in attribute information of the piece of mixed tag data, and store a plurality of records in correspondence with tag categories corresponding to the piece of tag data, thereby obtaining classified tag data corresponding to the piece of mixed tag data. All the mixed label data can be split and stored according to the method, so that classified label data corresponding to a plurality of pieces of mixed label data are obtained.

For example, taking the fourth field obtained by splitting in fig. 9 as an example, splitting all the information of the first attribute and all the information of the second attribute in the fourth field to obtain two records, and storing the two records corresponding to the label category of the piece of mixed label data, as shown in table 1, so as to obtain the classified label data corresponding to the piece of mixed label data.

Table 1 categorised tag data

In the embodiment of the application, the terminal can transmit the detection tag data for carrying out detection training and the classification tag data for carrying out classification training together, so that the transmission of redundant information can be effectively reduced, the invalid data transmission rate is reduced, meanwhile, the signaling interaction between the terminal and the server is reduced, and the system resource is saved.

Referring to fig. 10, an embodiment of the present application provides an apparatus 1000 for acquiring training data, where the apparatus 1000 includes:

the first receiving module 1001 is configured to receive mixed data of each sample picture in the plurality of sample pictures, where a data format of the mixed data includes a first field and a second field, the first field includes a first picture parameter for generating detection picture data, the second field includes a second picture parameter for generating classification picture data, the detection picture data is picture data for performing detection training, and the classification picture data is picture data for performing classification training;

the first splitting module 1002 is configured to split the mixed data of each sample picture according to a data format of the mixed data, so as to obtain a first field and a second field corresponding to each sample picture;

The generating module 1003 is configured to generate detected picture data of the corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generate classified picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture.

Optionally, the first picture parameter includes network storage information of a corresponding sample picture, location information of a detection target included in the corresponding sample picture, and tag class information, the location information of the detection target refers to location information of a target area including the detection target in the corresponding sample picture, and the second picture parameter includes attribute information of the detection target.

Optionally, the generating module 1003 includes:

the cutting sub-module is used for cutting the corresponding sample picture according to the position information of the detection target in each sample picture to obtain a target area picture containing the detection target;

the storage submodule is used for storing the multiple target area pictures obtained through cutting and obtaining the local storage address of each target area picture;

and the second generation submodule is used for generating classified picture data of the corresponding sample pictures according to the attribute information of the detection targets in each sample picture and the local storage address of the target area picture containing the corresponding detection targets.

Optionally, the attribute information of the detection target includes the number of attributes corresponding to the detection target, the attribute number of each attribute, and the value number of the attribute value corresponding to each attribute.

Optionally, the apparatus further comprises:

In summary, in the embodiment of the present application, the mixed data of each sample picture in the plurality of sample pictures is received, and because the data format of the mixed data includes the first field and the second field, the mixed data of each sample picture may be split according to the data format of the mixed data, so as to obtain the first field and the second field corresponding to each sample picture. And then generating detection picture data of the corresponding sample picture according to the first picture parameter in the first field corresponding to each sample picture, and generating classification picture data of the corresponding sample picture according to the second picture parameter in the second field corresponding to each sample picture. Therefore, the terminal in the embodiment of the application can generate the detection picture data for carrying out detection training and the classification picture data for carrying out classification training according to the mixed data, and does not need to independently acquire the classification picture data from the server, thereby reducing the transmission of redundant information, reducing the invalid transmission rate of the data and saving the bandwidth.

It should be noted that: in the apparatus for acquiring training data provided in the foregoing embodiment, only the division of the functional modules is used for illustration, and in practical application, the allocation of the functions may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the apparatus for acquiring training data provided in the foregoing embodiments and the method embodiment for acquiring training data belong to the same concept, and specific implementation processes of the apparatus for acquiring training data are detailed in the method embodiment and are not described herein again.

Fig. 11 is a block diagram showing a configuration of a terminal apparatus 1100 for performing character string detection according to an exemplary embodiment of the present application. The terminal device 1100 may be: smart phones, tablet computers, notebook computers or desktop computers. Terminal device 1100 may also be referred to by other names as user device, portable device for tuning neural network model, laptop device for tuning neural network model, desktop device for tuning neural network model, etc.

In general, the terminal apparatus 1100 includes: a processor 1101 and a memory 1102.

The processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1101 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 1101 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1101 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 1101 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1102 is used to store at least one instruction for execution by processor 1101 to implement the string detection method provided by the method embodiments of the present application.

In some embodiments, the terminal device 1100 may further optionally include: a peripheral interface 1103 and at least one peripheral. The processor 1101, memory 1102, and peripheral interface 1103 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 1103 by buses, signal lines or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1104, touch display 1105, camera 1106, audio circuitry 1107, positioning component 1108, and power supply 1109.

A peripheral interface 1103 may be used to connect I/O (Input/Output) related at least one peripheral device to the processor 1101 and memory 1102. In some embodiments, the processor 1101, memory 1102, and peripheral interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 1101, memory 1102, and peripheral interface 1103 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 1104 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 1104 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1104 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1104 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 1104 may communicate with other devices that adjust the neural network model via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 1104 may also include NFC (Near Field Communication, short-range wireless communication) related circuitry, which is not limiting of the application.

The display screen 1105 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 1105 is a touch display, the display 1105 also has the ability to collect touch signals at or above the surface of the display 1105. The touch signal may be input to the processor 1101 as a control signal for processing. At this time, the display screen 1105 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 1105 may be a front panel provided on the terminal device 1100; in other embodiments, the display 1105 may be at least two, and disposed on different surfaces of the terminal device 1100 or in a folded design; in other embodiments, the display 1105 may be a flexible display disposed on a curved surface or a folded surface of the terminal device 1100. Even more, the display 1105 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 1105 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 1106 is used to capture images or video. Optionally, the camera assembly 1106 includes a front camera and a rear camera. Typically, the front camera is disposed on a front panel of the terminal device, and the rear camera is disposed on a rear surface of the terminal device. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera, and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and VR (Virtual Reality) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 1106 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 1107 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 1101 for processing, or inputting the electric signals to the radio frequency circuit 1104 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones may be provided at different portions of the terminal device 1100, respectively. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1101 or the radio frequency circuit 1104 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 1107 may also include a headphone jack.

The location component 1108 is used to locate the current geographic location of the terminal device 1100 to enable navigation or LBS (Location Based Service, location-based services). The positioning component 1108 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, or the Galileo system of Russia.

The power supply 1109 is used to supply power to the various components in the terminal device 1100. The power source 1109 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power source 1109 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal device 1100 also includes one or more sensors 1110. The one or more sensors 1110 include, but are not limited to: acceleration sensor 1111, gyroscope sensor 1112, pressure sensor 1113, fingerprint sensor 1114, optical sensor 1115, and proximity sensor 1116.

The acceleration sensor 1111 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established in the terminal apparatus 1100. For example, the acceleration sensor 1111 may be configured to detect components of gravitational acceleration in three coordinate axes. The processor 1101 may control the touch display screen 1105 to display a user interface in a landscape view or a portrait view according to a gravitational acceleration signal acquired by the acceleration sensor 1111. Acceleration sensor 1111 may also be used for the acquisition of motion data of a game or a user.

The gyro sensor 1112 may detect a body direction and a rotation angle of the terminal device 1100, and the gyro sensor 1112 may collect a 3D motion of the user on the terminal device 1100 in cooperation with the acceleration sensor 1111. The processor 1101 may implement the following functions based on the data collected by the gyro sensor 1112: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 1113 may be disposed at a side frame of the terminal device 1100 and/or at a lower layer of the touch display screen 1105. When the pressure sensor 1113 is provided at a side frame of the terminal apparatus 1100, a grip signal of the terminal apparatus 1100 by a user can be detected, and the processor 1101 performs left-right hand recognition or quick operation based on the grip signal collected by the pressure sensor 1113. When the pressure sensor 1113 is disposed at the lower layer of the touch display screen 1105, the processor 1101 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 1105. The operability controls include at least one of button controls, scroll bar controls, icon controls, and menu controls.

The fingerprint sensor 1114 is used to collect a fingerprint of the user, and the processor 1101 identifies the identity of the user based on the collected fingerprint of the fingerprint sensor 1114, or the fingerprint sensor 1114 identifies the identity of the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the user is authorized by the processor 1101 to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 1114 may be provided on the front, back, or side of the terminal device 1100. When a physical key or vendor Logo is provided on the terminal device 1100, the fingerprint sensor 1114 may be integrated with the physical key or vendor Logo.

The optical sensor 1115 is used to collect the ambient light intensity. In one embodiment, the processor 1101 may control the display brightness of the touch display screen 1105 based on the intensity of ambient light collected by the optical sensor 1115. Specifically, when the intensity of the ambient light is high, the display luminance of the touch display screen 1105 is turned up; when the ambient light intensity is low, the display luminance of the touch display screen 1105 is turned down. In another embodiment, the processor 1101 may also dynamically adjust the shooting parameters of the camera assembly 1106 based on the intensity of ambient light collected by the optical sensor 1115.

A proximity sensor 1116, also referred to as a distance sensor, is typically provided on the front panel of the terminal device 1100. The proximity sensor 1116 is used to collect a distance between the user and the front surface of the terminal device 1100. In one embodiment, when the proximity sensor 1116 detects that the distance between the user and the front face of the terminal device 1100 gradually decreases, the processor 1101 controls the touch display 1105 to switch from the bright screen state to the off screen state; when the proximity sensor 1116 detects that the distance between the user and the front surface of the terminal apparatus 1100 gradually increases, the touch display screen 1105 is controlled by the processor 1101 to switch from the off-screen state to the on-screen state.

It will be appreciated by those skilled in the art that the structure shown in fig. 11 is not limiting and that terminal device 1100 may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

In an exemplary embodiment of the present application, there is also provided a computer-readable storage medium, for example, a memory including instructions executable by a processor in the above-described terminal device to perform the character string detection method in the above-described embodiment. For example, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims

1. A method of acquiring training data, the method comprising:

receiving mixed data of each sample picture in a plurality of sample pictures, wherein the data format of the mixed data comprises a first field and a second field, the first field comprises first picture parameters for generating detection picture data, the first picture parameters comprise network storage information of the corresponding sample picture, position information of a detection target contained in the corresponding sample picture and label type information, the position information of the detection target refers to the position information of a target area containing the detection target in the corresponding sample picture, the second field comprises second picture parameters for generating classification picture data, the second picture parameters comprise attribute information of the detection target, the detection picture data refer to picture data for performing detection training, and the classification picture data refer to picture data for performing classification training;

generating detection picture data of corresponding sample pictures according to the position information of the detection target in each sample picture, the label category information and the local storage address of the corresponding sample picture;

2. The method according to claim 1, wherein the attribute information of the detection target includes the number of attributes corresponding to the detection target, an attribute number of each attribute, and a value number of an attribute value corresponding to each attribute.

3. The method according to claim 1, wherein the method further comprises:

4. A method according to claim 3, wherein the detected tag data includes a tag class and a class number corresponding to the tag class, and the classified tag data includes attribute information corresponding to the tag class.

5. An apparatus for acquiring training data, the apparatus comprising:

the first receiving module is used for receiving mixed data of each sample picture in the plurality of sample pictures, the data format of the mixed data comprises a first field and a second field, the first field comprises first picture parameters used for generating detection picture data, the first picture parameters comprise network storage information of the corresponding sample picture, position information of a detection target contained in the corresponding sample picture and label type information, the position information of the detection target refers to the position information of a target area containing the detection target in the corresponding sample picture, the second field comprises second picture parameters used for generating classification picture data, the second picture parameters comprise attribute information of the detection target, the detection picture data refer to picture data used for carrying out detection training, and the classification picture data refer to picture data used for carrying out classification training;

the generation module is used for downloading corresponding sample pictures according to network storage information of each sample picture, storing the downloaded sample pictures, acquiring local storage addresses of each sample picture in the sample pictures, generating detection picture data of the corresponding sample picture according to position information of a detection target in each sample picture, label type information and the local storage addresses of the corresponding sample picture, cutting the corresponding sample picture according to the position information of the detection target in each sample picture to obtain target area pictures containing the detection target, storing the cut target area pictures, acquiring the local storage addresses of each target area picture, and generating classification picture data of the corresponding sample picture according to attribute information of the detection target in each sample picture and the local storage addresses of the target area pictures containing the corresponding detection target.

6. The apparatus according to claim 5, wherein the attribute information of the detection target includes the number of attributes corresponding to the detection target, an attribute number of each attribute, and a value number of an attribute value corresponding to each attribute.

7. The apparatus of claim 5, wherein the apparatus further comprises:

8. The apparatus of claim 7, wherein the detection tag data includes a tag class and a class number corresponding to the tag class, and wherein the classification tag data includes attribute information corresponding to the tag class.

9. A computer-readable storage medium, characterized in that the storage medium has stored therein a computer program which, when executed by a processor, implements the steps of the method of any of claims 1-4.