CN110046586A

CN110046586A - A kind of data processing method, equipment and storage medium

Info

Publication number: CN110046586A
Application number: CN201910319953.7A
Authority: CN
Inventors: 徐兴坤
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-04-19
Filing date: 2019-04-19
Publication date: 2019-07-23

Abstract

The embodiment of the present invention discloses a kind of data processing method, equipment and storage medium, and wherein method includes: to obtain sample object data respectively from multiple multi-medium datas, extracts the characteristics of objects information of sample object data；Multiple sample object data are clustered according to characteristics of objects information, obtain multiple the first characteristics of objects clusters with different cluster labels；Foreign peoples's object data in each first characteristics of objects cluster is cleared up, the first characteristics of objects cluster after cleaning is determined as the second characteristics of objects cluster；Similar cluster merging is carried out between multiple second characteristics of objects clusters, generates multiple third characteristics of objects clusters with different cluster labels, and the object tag of sample object data is updated to the cluster label of affiliated third characteristics of objects cluster；Object tag and characteristics of objects information based on the sample object data in each third characteristics of objects cluster, training Object identifying model.Using the present invention, the accuracy rate of recognition of face can be improved.

Description

A kind of data processing method, equipment and storage medium

Technical field

The present invention relates to electronic technology field more particularly to a kind of data processing methods, equipment and storage medium.

Background technique

With the continuous popularization of face recognition technology in practical applications, recognition of face there are the problem of and difficult point also gradually It highlights, in insecure acquisition environment, the face performance of user and ambient condition are ever-changing, and complexity is remote Far beyond the picture obtained in standard evaluation and test, changes for example, blocking and will lead to face characteristic, will increase recognition of face Difficulty increases reject rate；Illumination, the variation of posture and expression, the face characteristic also resulted under the varying environment of a people are poor It is different very big, reduce discrimination, it is also possible to will increase wrong identification.Currently, in order to promote face recognition technology in practical application Performance in scene, method the most effective is the face training data for largely having identity label by increasing, reality A large amount of human face data present in scene (such as monitoring, market, community etc.), but the overwhelming majority is the people of not identity label Face data can not largely obtain the face training data of identity label, then can not promote people by increasing face training data Face recognition performance.

Summary of the invention

The embodiment of the present invention provides a kind of data processing method, equipment and storage medium, and the standard of recognition of face can be improved True rate.

On the one hand the embodiment of the present invention provides a kind of data processing method, it may include:

It obtains sample object data respectively from multiple multi-medium datas, extracts the characteristics of objects of the sample object data Information；

Multiple sample object data are clustered according to the characteristics of objects information, are obtained multiple with different cluster labels The first characteristics of objects cluster；Cluster label phase of the object tag of all sample object data with the first affiliated characteristics of objects cluster Together；

Foreign peoples's object data in each first characteristics of objects cluster is cleared up respectively, the first characteristics of objects cluster after cleaning is true It is set to the second characteristics of objects cluster；

Similar cluster merging is carried out between multiple second characteristics of objects clusters, generates multiple thirds pair with different cluster labels As feature cluster, the object tag of the sample object data is updated to the cluster label of the affiliated third characteristics of objects cluster；

Object tag and the object based on the sample object data in each third characteristics of objects cluster are special Reference breath, training Object identifying model；The Object identifying model object data to be predicted and target object data for identification Between similarity.

Wherein, described to obtain sample object data respectively from multiple multi-medium datas, extract the sample object data Corresponding characteristics of objects information, comprising:

The object location information in multiple multi-medium datas is obtained, according to the object location information from each multimedia number According to the middle subject area for obtaining respectively and there is target size, the picture material in the subject area is determined as sample object number According to the characteristics of objects information of acquisition each sample object data；

If the multi-medium data is image data, set at random for the sample object data in described image data Set object tag；

If the multi-medium data be video data, detect the sample object data in the video data with Relating attribute will be arranged with the sample object data of the identical pursuit path information in track trace information, and for described Identical object tag is arranged in the sample object data of relating attribute.

Wherein, the picture material by the subject area is determined as sample object data, comprising:

Picture material in the subject area is subjected to the affine transformation based on target bearing, by the figure after affine transformation As content is determined as sample object data；Object in the sample object data is in the target bearing；

Then the method also includes:

Obtain multi-medium data to be predicted；

The target object area with target size is obtained according to the object location information in the multi-medium data to be predicted Picture material to be predicted in the targeted object region is carried out the affine transformation based on the target bearing by domain, will be affine Transformed picture material to be predicted is determined as object data to be predicted；

The similarity between object data to be predicted and target object data is obtained based on Object identifying model.

Wherein, described that multiple sample object data are clustered according to the characteristics of objects information, it obtains multiple having First characteristics of objects cluster of different cluster labels, comprising:

First sample object data and the second sample object data are obtained from multiple sample object data, by described first The object tag of sample object data is determined as the first object tag, and the object tag of the second sample object data is determined For the second object tag；First object tag and second object tag be not identical；

The object of the characteristics of objects information and the second sample object data that obtain the first sample object data is special The first image similarity between reference breath；

If the first image similarity is greater than first threshold, by least one sample with first object tag This object data and at least one sample object data with the second object tag, are divided into identical first characteristics of objects Cluster, and set the first object tag and the second object tag on the cluster label of the first divided characteristics of objects cluster；

When all sample object data are divided into the first affiliated characteristics of objects cluster, and each first object is special Sample object data in sign cluster and the characteristics of objects cluster are stored.

Wherein, the foreign peoples's object data cleared up in each first characteristics of objects cluster respectively, by first pair after cleaning As feature cluster is determined as the second characteristics of objects cluster, comprising:

Response is directed to the first marking operation of the first characteristics of objects cluster, by institute indicated by first marking operation The sample object data in the first characteristics of objects cluster are stated, the corresponding Standard object data of the first characteristics of objects cluster is determined as；

Response is directed to the second marking operation of non-standard object data, will be non-standard indicated by second marking operation Object data is determined as foreign peoples's object data；The non-standard object data is that the standard is removed in the first characteristics of objects cluster Sample object data except object data；

Foreign peoples's object data is deleted from the first characteristics of objects cluster, the first of foreign peoples's object data will be deleted Characteristics of objects cluster is determined as the second characteristics of objects cluster.

Wherein, described that similar cluster merging is carried out between multiple second characteristics of objects clusters, it generates multiple with different cluster marks The object tag of the sample object data is updated to the affiliated third characteristics of objects cluster by the third characteristics of objects cluster of label Cluster label, comprising:

The first detection feature cluster and the second detection feature cluster are obtained from multiple second characteristics of objects clusters；

Response is based on the merging for the merging request of the first detection feature cluster and the second detection feature cluster In first quantity of the sample object data in the first detection feature cluster described in request and the second detection feature cluster Second quantity of sample object data；

According to each sample in each sample object data in the first detection feature cluster and the second detection feature cluster The image similarity of this object data, first quantity and second quantity obtain the first detection feature cluster and institute State the classification similarity of the second detection feature cluster；

If the classification similarity is greater than second threshold, the first detection feature cluster and the second detection feature cluster are merged into Third characteristics of objects cluster, and the object tag of the sample object data in the first detection feature cluster and the second detection feature cluster is set The cluster label of the third characteristics of objects cluster belonging to being set to.

Wherein, described that similar cluster merging is carried out between multiple second characteristics of objects clusters, it generates multiple with different cluster marks The object tag of the sample object data is updated to the affiliated third characteristics of objects cluster by the third characteristics of objects cluster of label Cluster label after, further includes:

The third characteristics of objects cluster is determined as to recycle characteristics of objects cluster；

Foreign peoples's object data in each circulation characteristics of objects cluster is cleared up respectively；

Circulation characteristics of objects cluster after deletion foreign peoples's object data is subjected to similar cluster merging, it is special to obtain the third object Levy cluster；

When meeting the condition of convergence or similar cluster merging number equal to frequency threshold value for the third characteristics of objects cluster, Execute the object tag and characteristics of objects information based on the sample object data in each third characteristics of objects cluster, instruction The step of practicing Object identifying model.

Wherein, further includes:

Between each third characteristics of objects cluster Plays object data and non-standard object data is obtained respectively Two image similarities obtain the corresponding similarity of each third characteristics of objects cluster according to second image similarity Mean value and similarity variance；

Similarity mean value is less than mean value threshold value and similarity variance is greater than the third characteristics of objects cluster of variance threshold values It is determined as target object feature cluster；

Obtain the testing result for being directed to the target object feature cluster；

If the testing result is that foreign peoples's object data and similar cluster are not present in the target object feature cluster, it is determined that The third characteristics of objects cluster meets the condition of convergence.

Wherein, the object tag and object based on the sample object data in each third characteristics of objects cluster are special Reference breath, training Object identifying model, comprising:

The characteristics of objects information and initial object for obtaining the sample object data in the third characteristics of objects cluster identify mould Sample image similarity between the characteristics of objects information of Standard object data in type；

According in the object tag of the sample object data in the Object identifying model, the third characteristics of objects cluster The object tag of sample object data and the sample image similarity determine similarity error, and according to the similarity error Backpropagation adjusts the model parameter of the initial object identification model；

When the adjustment number of the model parameter is equal to adjustment threshold value or the similarity error meets the condition of convergence, Initial object identification model comprising model parameter adjusted is determined as the Object identifying model.

On the one hand the embodiment of the present invention provides a kind of data processing equipment, it may include:

Characteristic acquisition unit, for obtaining sample object data respectively from multiple multi-medium datas, described in extraction The characteristics of objects information of sample object data；

Fisrt feature cluster acquiring unit, for being gathered according to the characteristics of objects information to multiple sample object data Class obtains multiple the first characteristics of objects clusters with different cluster labels；The object tag of all sample object data with it is affiliated The first characteristics of objects cluster cluster label it is identical；

Second feature cluster acquiring unit, foreign peoples's object data for being cleared up in each first characteristics of objects cluster respectively will The first characteristics of objects cluster after cleaning is determined as the second characteristics of objects cluster；

Third feature cluster acquiring unit generates more for carrying out similar cluster merging between multiple second characteristics of objects clusters A third characteristics of objects cluster with different cluster labels, by the object tag of the sample object data be updated to belonging to described in The cluster label of third characteristics of objects cluster；

Model training unit, for the object based on the sample object data in each third characteristics of objects cluster Label and the characteristics of objects information, training Object identifying model；Object identifying model number of objects to be predicted for identification According to the similarity between target object data.

Wherein, the characteristic acquisition unit includes:

Location information obtains subelement, for obtaining the object location information in multiple multi-medium datas, according to described right As location information obtains the subject area with target size respectively from each multi-medium data；

Object data obtains subelement, for the picture material in the subject area to be determined as sample object data, Obtain the characteristics of objects information of each sample object data；

Subelement is arranged in object tag, if being image data for the multi-medium data, in described image data The sample object data be randomly provided object tag；If the multi-medium data is video data, the sample is detected Pursuit path information of the object data in the video data, by the sample object number with the identical pursuit path information According to setting relating attribute, and for the sample object data with the relating attribute, identical object tag is set.

Wherein, the object data obtains subelement and is specifically used for:

Picture material in the subject area is subjected to the affine transformation based on target bearing, by the figure after affine transformation As content is determined as sample object data, the characteristics of objects information of acquisition each sample object data；The sample object Object in data is in the target bearing；

Then the method also includes:

Object data recognition unit, for obtaining multi-medium data to be predicted；According in the multi-medium data to be predicted Object location information obtain have target size targeted object region, by the image to be predicted in the targeted object region Content carries out the affine transformation based on the target bearing, and it is to be predicted right that the picture material to be predicted after affine transformation is determined as Image data；The similarity between object data to be predicted and target object data is obtained based on Object identifying model.

Wherein, the fisrt feature cluster acquiring unit is specifically used for:

Wherein, the second feature cluster acquiring unit is specifically used for:

Wherein, the third feature cluster acquiring unit is specifically used for:

Wherein, further includes:

Feature cluster recycles acquiring unit, recycles characteristics of objects cluster for the third characteristics of objects cluster to be determined as；Respectively Clear up foreign peoples's object data in each circulation characteristics of objects cluster；By delete foreign peoples's object data after circulation characteristics of objects cluster into The similar cluster of row merges, and obtains the third characteristics of objects cluster；Meet the condition of convergence when being directed to the third characteristics of objects cluster, or same When class cluster merges number equal to frequency threshold value, the model training unit is triggered.

Wherein, further includes:

Detection unit, for obtaining each third characteristics of objects cluster Plays object data and non-standard object respectively The second image similarity between data obtains each third characteristics of objects cluster according to second image similarity and distinguishes Corresponding similarity mean value and similarity variance；Similarity mean value is less than mean value threshold value and similarity variance is greater than variance threshold values The third characteristics of objects cluster be determined as target object feature cluster；

Testing result acquiring unit, for obtaining the testing result for being directed to the target object feature cluster；If the detection As a result for there is no foreign peoples's object data and similar clusters in the target object feature cluster, it is determined that the third characteristics of objects cluster Meet the condition of convergence.

Wherein, the model training unit is specifically used for:

On the one hand the embodiment of the present invention provides a kind of computer storage medium, the computer storage medium is stored with more Item instruction, described instruction are suitable for being loaded by processor and executing above-mentioned method and step.

On the one hand the embodiment of the present invention provides a kind of data processing equipment, including processor and memory；Wherein, described Memory is stored with computer program, and the computer program is suitable for being loaded by the processor and executing following steps:

In embodiments of the present invention, by obtaining sample object data respectively from multiple multi-medium datas, described in extraction The characteristics of objects information of sample object data；Multiple sample object data are clustered according to the characteristics of objects information, are obtained To multiple the first characteristics of objects clusters with different cluster labels；The object tag of all sample object data is with affiliated first The cluster label of characteristics of objects cluster is identical；Foreign peoples's object data in each first characteristics of objects cluster is cleared up respectively, after cleaning First characteristics of objects cluster is determined as the second characteristics of objects cluster；Similar cluster merging is carried out between multiple second characteristics of objects clusters, it is raw At multiple third characteristics of objects clusters with different cluster labels, by the object tag of the sample object data be updated to belonging to The cluster label of the third characteristics of objects cluster；Pair based on the sample object data in each third characteristics of objects cluster As label and the characteristics of objects information, training Object identifying model；Object identifying model object to be predicted for identification Similarity between data and target object data.By being clustered to sample object data, foreign peoples's object data remove and Similar cluster merges the sample object data for generating and having label, by largely the sample object data processing of label is not to have label Sample object data, then using the sample object data training Object identifying model for largely having label, it can be to avoid Object identifying The model lower problem of the very few discrimination for leading to Object identifying model of training data in the training process, it is a large amount of by obtaining There are the sample object data of label, improves the discrimination to object data to be predicted.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is a kind of system architecture diagram of data processing method provided in an embodiment of the present invention；

Fig. 2 a is a kind of schematic diagram of a scenario of data processing method provided in an embodiment of the present invention；

Fig. 2 b is a kind of schematic diagram of a scenario of data processing method provided in an embodiment of the present invention；

Fig. 2 c is a kind of interface schematic diagram of data processing method provided in an embodiment of the present invention；

Fig. 3 is a kind of flow diagram of data processing method provided in an embodiment of the present invention；

Fig. 4 is the flow diagram of another data processing method provided in an embodiment of the present invention；

Fig. 5 is a kind of example schematic that label is arranged provided in an embodiment of the present invention；

Fig. 6 is the example schematic that a kind of foreign peoples's object data provided in an embodiment of the present invention is deleted；

Fig. 7 is the example schematic that a kind of feature cluster provided in an embodiment of the present invention merges；

Fig. 8 is a kind of example schematic of data processing method provided in an embodiment of the present invention；

Fig. 9 is a kind of structural schematic diagram of data processing equipment provided in an embodiment of the present invention；

Figure 10 is the structural schematic diagram of another data processing equipment provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

It referring to Figure 1, is a kind of system architecture diagram of data processing provided in an embodiment of the present invention.Server 10f passes through Interchanger 10e and communication bus 10d and user terminal cluster establish connection, user terminal cluster can include: user terminal 10a, User terminal 10b ..., user terminal 10c.The multi-medium data of multiple not labels is stored in database 10g.Server 10f extracts the multi-medium data of multiple not labels from database 10g, and sample object number is obtained from multi-medium data According to each sample object data being divided to corresponding characteristics of objects cluster, and give characteristics of objects by the way of unsupervised learning Corresponding label is arranged in cluster and sample object data, and label includes the object tag of sample object data and the cluster of characteristics of objects cluster Label, the object tag of all sample object data are identical as the affiliated cluster label of characteristics of objects cluster.Server 10f is based on Multiple sample object data for carrying label, by the way of supervised learning, training Object identifying model, and store the object Identification model.Server 10f can be based on trained Object identifying model inspection object data to be predicted and target object number Similarity between, when detecting that the similarity between object data to be predicted and target object data is less than threshold value, clothes Device 10f be engaged in corresponding user terminal transmission standby signal, for prompting user's object data to be predicted and target object data It is not belonging to the same object；When detecting that the similarity between object data to be predicted and target object data is more than or equal to When threshold value, server 10f sends standby signal to corresponding user terminal, for prompting user's object data to be predicted and target Object data belongs to the same object, for example, Object identifying model is human face recognition model, the face unlock for mobile terminal Password then prompts user people to be predicted when the similarity between human face data to be predicted and target human face data is less than threshold value Face data and target human face data are not the same face, mobile terminal unlock failure, when human face data to be predicted and target person When similarity between face data is more than or equal to threshold value, then user's human face data to be predicted and target human face data is prompted to be The same face, mobile terminal unlock successfully.Or when server 10f receives the inspection of the user terminal in terminal cluster When surveying request, the corresponding object data to be predicted and target object data of above-mentioned detection request is identified based on Object identifying model Between similarity whether be greater than threshold value, the result that server 10f will test is sent to corresponding user terminal.Certainly, Trained Object identifying model can be sent to the user terminal each user terminal in cluster by server 10f, by user's end Detect the similarity between the target object data of object data to be predicted and storage in the user terminal in end.

Data processing method provided in an embodiment of the present invention can be applied to do not have identity mark largely from reality scene The sample object data for having label for Object identifying model training are excavated in the sample object data of label, the present invention is implemented The data processing equipment that example is related to can be headend equipment, comprising: tablet computer, smart phone, PC (PC), notebook The terminal devices such as computer, palm PC, headend equipment directly handle the sample object data obtained from multiple multi-medium datas, The data processing equipment can also be server, and server process obtains sample object number from database or headend equipment According to and handled, processing result is sent to database or headend equipment stores.

Since label is arranged in the sample object data for no label, and training Object identifying model needs are related to largely Operation, therefore server 10f can use distributed structure/architecture, and data/model is carried out cutting, by multiple working nodes, divide Cloth Optimized model training, finally model is polymerize, and above procedure can be linux system, Window system or Application program or system program in person's Unix system etc. etc..It is following by taking the process in linux system as an example, be based on a large amount of nothings The multi-medium data training Object identifying model of label.Wherein, user terminal may include mobile phone, tablet computer, notebook electricity Brain, palm PC, intelligent sound, mobile internet device (MID, mobile internet device), POS (Point Of Sales, point of sale) machine, wearable device (such as smartwatch, Intelligent bracelet etc.) etc..

Wherein, be able to achieve will be a large amount of for the server 10f in above-mentioned Fig. 1 corresponding embodiment or any one user terminal The sample object data processing of no label is the sample object data with label, to realize based on the sample object with label Data carry out Object identifying model training and are described by taking server 10f as an example below for ease of description.Please also refer to Fig. 2 a- Fig. 2 b is a kind of schematic diagram of a scenario of data processing provided in an embodiment of the present invention.Server 10f obtains n without label Multi-medium data, the object location information in multiple multi-medium datas is obtained, according to the object location information from each more The subject area with target size is obtained in media data respectively, the picture material in the subject area is determined as sample Object data, and obtain the characteristics of objects information of each sample object data.

So far, server 10f just obtains multiple sample object data and the corresponding object of each sample object data Characteristic information.The sample object data are specifically as follows facial image or the image of other objects, next, service Device 10f calculates the image similarity between multiple sample object data, and specific method is to calculate two sample object data pair The distance between the characteristics of objects information answered information, i.e. image similarity between sample object data, if image similarity is big In first threshold, then the two sample object data are divided into identical first characteristics of objects cluster, it should be noted that if it exists The object tag of sample object data is identical as the object tag of sample object data for being divided to the first characteristics of objects cluster, then will Sample object data with same object label are also divided to the first characteristics of objects cluster, for example, calculating first sample number of objects When according to image similarity between the second sample object data, wherein first sample object data is provided with relating attribute, i.e., The first sample object data has pursuit path information with other at least one sample object data in video data, then While first sample object data and the second sample object data are divided to the first characteristics of objects cluster, it can also will be provided with Other sample object data of relating attribute are also divided to the first characteristics of objects cluster.It should be noted that multiple samples pair When image data is clustered, since sample object data bulk is huge, cluster is takeed a long time, in order to improve the effect of cluster Multiple sample object data can be divided into multiple sample object data acquisition systems by the way of Distributed Cluster here by rate, Multiple sample object data sets are clustered respectively simultaneously, each sample object data set can be generated multiple first pairs As feature cluster.

It should be noted that in addition to the clustering method of above-mentioned calculating image similarity, it can also be using the poly- of other modes Class algorithm.

It is finally the third characteristics of objects cluster obtained according to multiple first characteristics of objects clusters for training Object identifying model, First foreign peoples's object data in each first characteristics of objects cluster is deleted, to generate multiple first characteristics of objects clusters, then basis Merge request to merge the merging cluster in the second characteristics of objects cluster, generate third characteristics of objects cluster, the merging cluster is to close And request the indicated at least two second characteristics of objects clusters that can be merged.And then third characteristics of objects cluster can be used In sample object data training Object identifying model 20a, it is subsequent directly to examine after Object identifying model 20a is trained It surveys to the similarity between object data to be predicted and target object data.

As shown in Figure 2 b, trained Object identifying model 20a in advance is stored in server.User terminal will be to be predicted Object data is sent to server for detecting the similarity between target object data.Server obtains number of objects to be predicted According to characteristics of objects information, based on Object identifying model 20a identification object data to be predicted and target object data between Similarity is less than threshold value, it is determined that does not pass through for detection, specifically, can be applied to field of face identification, if detecting to pre- The similarity surveyed between human face data and target human face data is less than threshold value, and the result that server will test is sent to user's end End, and prompting the user terminal human face data to be predicted is the human face data of stranger.

Fig. 2 c is referred to, is a kind of interface schematic diagram of data processing provided in an embodiment of the present invention.With recognition of face system For system, monitor supervision platform training human face recognition model, with detect the face in video monitoring with server cluster (including service Device 1, server 2 ..., the target face in server n) whether be the same person face.As shown in the 20c of interface, Yong Hudian " performance upgrade " button in interface is hit, monitor supervision platform copies the human face data feature that there is label in part from server cluster Cluster has the human face data feature cluster of label that can refer to the third pair in above-mentioned Fig. 2 a corresponding embodiment in the server cluster As feature cluster, increase has the sample object data of label to be trained human face recognition model, human face recognition model can be improved The accuracy rate of recognition of face.Monitor supervision platform is based on the above-mentioned human face data feature cluster for having label, training face detection model.Face After detection model training, as shown in the 20d of interface, user clicks " starting to scan " button in interface, and monitor supervision platform can be with Continuously and uninterruptedly, video data is obtained by camera, detects the target in the face and server cluster in video data Whether face is the face of the same person, and will test result and shown.User can also be with certain interval of time again from clothes New data is copied in business device cluster, for updating Face datection model.

Wherein, the first characteristics of objects cluster, the second characteristics of objects cluster and third characteristics of objects fasciation at detailed process can be with Referring to embodiment corresponding to following figure 3 to Fig. 8.

Below in conjunction with attached drawing 3- attached drawing 8, describe in detail to data processing method provided in an embodiment of the present invention.

Fig. 3 is referred to, for the embodiment of the invention provides a kind of flow diagrams of data processing method.As shown in figure 3, The embodiment of the present invention the method may include following steps S101- step S105.

S101 obtains sample object data respectively from multiple multi-medium datas, extracts pair of the sample object data As characteristic information；

Specifically, data processing equipment can be any one in the user terminal cluster in above-mentioned Fig. 1 corresponding embodiment A user terminal, is specifically as follows: tablet computer, smart phone, PC (PC), laptop, palm PC etc. are eventually End equipment, alternatively, the data processing equipment may be the server 10f in above-mentioned Fig. 1 corresponding embodiment.Data processing is set It is standby to obtain sample object data respectively from multiple multi-medium datas, the characteristics of objects information of the sample object data is extracted, It is understood that the multi-medium data includes image data and video data, the video data includes an at least frame figure As data, obtaining sample object data respectively from multiple multi-medium datas substantially is the acquisition sample pair from image data Image data, for example, can obtain sample object data from several pictures either video file, the sample object data are The image-region of target size in multi-medium data comprising sample object, sample object may include the face of face, animal Either object may include at least one sample object data in each multi-medium data, can also not have sample object number According to data processing equipment extracts characteristics of objects information from the sample object data, and the characteristics of objects information is target ruler The information such as the textural characteristics, shape feature of sample object and spatial relation characteristics in very little image-region, usually to image ash The characteristics of objects information of sample object data is extracted after degreeization processing using feature extracting method, feature extracting method includes scale Invariant features convert (Scale-Invariant Feature Transform, SIFT), histograms of oriented gradients (Histogram Of Oriented Gradient, HOG).

S102 clusters multiple sample object data according to the characteristics of objects information, obtains multiple with difference First characteristics of objects cluster of cluster label；Cluster of the object tag of all sample object data with the first affiliated characteristics of objects cluster Label is identical；

Specifically, data processing equipment clusters multiple sample object data according to the characteristics of objects information, obtain To multiple the first characteristics of objects clusters with different cluster labels；The object tag of all sample object data is with affiliated first The cluster label of characteristics of objects cluster is identical, it is to be understood that data processing equipment is using clustering algorithm to the multiple sample pair Image data is clustered, and multiple the first characteristics of objects clusters with different cluster labels are obtained, and the first characteristics of objects cluster is tool There is the set of the sample object data of identical clustering criteria, includes at least one sample object number in each first characteristics of objects cluster According to image feature information corresponding with the sample object data, the cluster label is the identification information of the first characteristics of objects cluster, The object tag is the identification information of sample object data, and the object tag of all sample object data is with affiliated first The cluster label of characteristics of objects cluster is identical, and the object tag of the sample object data of each first characteristics of objects cluster is all the same, different The first object data cluster label it is not identical, between the image feature information in this programme using sample object data Image similarity judges whether two sample object data belong to the same first characteristics of objects cluster as clustering criteria, then counts The image similarity between the corresponding image feature information of two sample object data is calculated, calculation formula is as follows:

Wherein, S indicates the image similarity between two sample object data, f_A=[a₀,a₁,…,a_N], f_B=[b₀, b₁,…,b_N] indicating the feature vectors of two sample object data, N indicates characteristic dimension, and A and B indicate two sample object numbers According to, if image similarity S >₀, then sample object data A and B are divided to identical first characteristics of objects cluster, described first pair As the cluster label of feature cluster can choose sample object data A and B one of them object tag as cluster label, can also set The different cluster label of object tag with sample object data A and B is set, then by the object tag of sample object data A and B It is set as the cluster label of the first characteristics of objects cluster, until all sample object data are divided to the first characteristics of objects cluster, And set the object tag of all sample object data on the cluster label of the first affiliated characteristics of objects cluster.

S103 clears up foreign peoples's object data in each first characteristics of objects cluster respectively, and the first object after cleaning is special Sign cluster is determined as the second characteristics of objects cluster；

Specifically, data processing equipment clears up foreign peoples's object data in each first characteristics of objects cluster respectively, will clear up The first characteristics of objects cluster afterwards is determined as the second characteristics of objects cluster, it is to be understood that foreign peoples's object data is first pair As the sample object data being labeled in feature cluster, user selects to be not belonging to the first object spy from the first characteristics of objects cluster The sample object data of cluster are levied, and the sample object data of selection are marked, the labeled sample object data are Foreign peoples's object data in each first characteristics of objects cluster is deleted respectively, will be deleted by foreign peoples's object data, data processing equipment The first characteristics of objects cluster after foreign peoples's object data is determined as the second characteristics of objects cluster, and the second characteristics of objects cluster is with phase With the set of the sample object data of clustering criteria, in each second characteristics of objects cluster include at least one sample object data and The corresponding image feature information of the sample object data.

S104, carries out similar cluster merging between multiple second characteristics of objects clusters, generates multiple with different cluster labels The object tag of the sample object data is updated to the cluster of the affiliated third characteristics of objects cluster by third characteristics of objects cluster Label；

Specifically, data processing equipment carries out similar cluster merging between multiple second characteristics of objects clusters, multiple tools are generated There is the third characteristics of objects cluster of different cluster labels, the object tag of the sample object data is updated to the affiliated third The cluster label of characteristics of objects cluster, it is to be understood that multiple sample object data are being carried out by the way of Distributed Cluster When cluster, since multiple sample object data are divided into multiple sample object data acquisition systems, due to different sample object data Set is clustered respectively, there may be the first characteristics of objects cluster to be merged in different sample object data acquisition systems, The the first characteristics of objects cluster for removing foreign peoples's object data is carried out and and can be improved the accuracy rate of cluster, the similar cluster is the It is labeled one group of second characteristics of objects cluster in two characteristics of objects clusters, includes at least two second characteristics of objects in each group of similar cluster Cluster, data processing equipment, which merges the second characteristics of objects cluster in each group, generates multiple third objects with different cluster labels Feature cluster, there may be the similar cluster that multiple groups are labeled in the second characteristics of objects cluster, each group of similar cluster generates after merging One third characteristics of objects cluster, there may also be the second characteristics of objects cluster is not labeled in the second characteristics of objects cluster, not by Second characteristics of objects cluster of label merges without similar cluster, and the cluster label of the third characteristics of objects cluster can be combined into the The cluster label of any one the second characteristics of objects cluster of three characteristics of objects clusters, can also regenerate new cluster label, and by The object tag of sample object data is updated to the cluster label of the affiliated third characteristics of objects cluster, institute in three characteristics of objects clusters Stating third characteristics of objects cluster is the set with the sample object data of identical clustering criteria, is wrapped in each third characteristics of objects cluster Include at least one sample object data and the corresponding image feature information of the sample object data.

S105, object tag based on the sample object data in each third characteristics of objects cluster and described right As characteristic information, training Object identifying model；The Object identifying model object data and target object to be predicted for identification Similarity between data.

Specifically, pair of the data processing equipment based on the sample object data in each third characteristics of objects cluster As label and the characteristics of objects information, training Object identifying model；Object identifying model object to be predicted for identification Similarity between data and target object data, it is to be understood that data processing equipment is special by each third object The object tag and the characteristics of objects information for levying the sample object data in cluster obtain Object identifying as input information The model parameter of model, the Object identifying model are similar between object data to be predicted and target object data for identification Degree, the sample object data are more, and the model parameter of Object identifying model is more accurate, calculated using the Object identifying model Prediction object data and target object data between similarity accuracy it is higher.

In embodiments of the present invention, by obtaining sample object data respectively from multiple multi-medium datas, described in extraction The characteristics of objects information of sample object data；Multiple sample object data are clustered according to the characteristics of objects information, are obtained To multiple the first characteristics of objects clusters with different cluster labels；The object tag of all sample object data is with affiliated first The cluster label of characteristics of objects cluster is identical；Foreign peoples's object data in each first characteristics of objects cluster is cleared up respectively, after cleaning First characteristics of objects cluster is determined as the second characteristics of objects cluster；Similar cluster merging is carried out between multiple second characteristics of objects clusters, it is raw At multiple third characteristics of objects clusters with different cluster labels, by the object tag of the sample object data be updated to belonging to The cluster label of the third characteristics of objects cluster；Pair based on the sample object data in each third characteristics of objects cluster As label and the characteristics of objects information, training Object identifying model；Object identifying model object to be predicted for identification Similarity between data and target object data.By being clustered to sample object data, foreign peoples's object data remove and Similar cluster merges the sample object data for generating and having label, by largely the sample object data of label do not generate the sample for having label This object data, then using the sample object data training Object identifying model for largely having label, it can be to avoid Object identifying mould The type lower problem of the very few discrimination for leading to Object identifying model of training data in the training process largely has by obtaining The sample object data of label, improve the discrimination to object data to be predicted.

Fig. 4 is referred to, for the embodiment of the invention provides a kind of flow diagrams of data processing method.As shown in figure 4, The embodiment of the present invention the method may include following steps S201- step S210.

S201 obtains the object location information in multiple multi-medium datas, according to the object location information from each more The subject area with target size is obtained in media data respectively；

Specifically, data processing equipment obtains the object location information in multiple multi-medium datas, according to the object position Confidence breath obtains the subject area with target size respectively from each multi-medium data, it is to be understood that more matchmakers Volume data includes image data and video data, and the object location information is image district where sample object in multi-medium data The location information in domain, sample object may include face, animal face either object, according to the object location information from The subject area with target size is obtained in each multi-medium data respectively, the target size is preset fixed dimension.

Picture material in the subject area is determined as sample object data, obtains each sample pair by S202 The characteristics of objects information of image data；

Specifically, the picture material in the subject area is determined as sample object data by data processing equipment, obtain The characteristics of objects information of each sample object data, it is to be understood that data processing equipment will be in the subject area Picture material carry out the affine transformation based on target bearing, the picture material after affine transformation is determined as sample object number According to the characteristics of objects information of acquisition each sample object data；Object in the sample object data is in the mesh Orientation is marked, the affine transformation is that a vector space is carried out to once linear transformation and a translation transformation, is transformed to another A vector space can be adjusted the sample object in sample object data to target bearing, the target by affine transformation Orientation is preset azimuth information, and the target bearing can be adjusted by the way that the transformation parameter of affine transformation is arranged, below The detailed process of the face characteristic information of acquisition facial image is described by image data: detecting own in image data Face, with rectangle surround frame indicate face location；Face location rectangle surround frame in, detect human face five-sense-organ (eyes, Nose, mouth etc.) location information, and to rectangle surround frame in image carry out affine transformation, by image face adjust Face is usually adjusted to the direction to face in order to facilitate the extraction and comparison of face characteristic information to target bearing；It presses According to preset target size, fixed-size face image data is obtained, in order to improve the quality of face image data, meeting According to the indexs such as human face posture, image fog-level, ambient lighting and coverage extent, the relatively high face of filtering mass Image data, and feature extracting method is utilized, the face characteristic information of the face image data after obtaining screening.

S203, if the multi-medium data is image data, for the sample object data in described image data It is randomly provided object tag；If the multi-medium data is video data, the sample object data are detected in the video Relating attribute will be arranged with the sample object data of the identical pursuit path information in pursuit path information in data, and For the sample object data with the relating attribute, identical object tag is set；

Specifically, data processing equipment is the institute in described image data if the multi-medium data is image data It states sample object data and is randomly provided object tag；If the multi-medium data is video data, the sample object is detected Pursuit path information of the data in the video data sets the sample object data with the identical pursuit path information Relating attribute is set, and identical object tag is set for the sample object data with the relating attribute, it is to be understood that Corresponding object tag can be arranged according to the type of multi-medium data in data processing equipment, if the multi-medium data is image Data, then be randomly provided object tag to the sample object data, and different sample object data correspond to different object marks Label detect tracking rail of the sample object data in the video data if the multi-medium data is video data Mark information, the pursuit path information are the change informations of sample object data position in the different frame of video data, When the variation of sample object data position in the different frame of video data is in threshold range, it is determined that be in threshold value The object data pursuit path information having the same changed in range, by the sample pair with the identical pursuit path information Relating attribute is arranged in image data, and the relating attribute is a kind of mark information, having the same for characterizing sample object information Pursuit path information, and identical object tag is set for the sample object data with the relating attribute.

S204 clusters multiple sample object data according to the characteristics of objects information, obtains multiple with difference First characteristics of objects cluster of cluster label；Cluster of the object tag of all sample object data with the first affiliated characteristics of objects cluster Label is identical；

The S204 of the embodiment of the present invention may comprise steps of S2041-S2044:

S2041 obtains first sample object data and the second sample object data, by institute from multiple sample object data The object tag for stating first sample object data is determined as the first object tag, by the object mark of the second sample object data Label are determined as the second object tag；First object tag and second object tag be not identical；

Specifically, data processing equipment obtains first sample object data and the second sample from multiple sample object data The object tag of the first sample object data is determined as the first object tag by object data, by second sample pair The object tag of image data is determined as the second object tag, it is to be understood that the first sample object data is multiple samples A sample object data in this object data, the second sample object data be in multiple sample object data with it is described The different sample object data of the object tag of first sample object data, by the object mark of the first sample object data Label are determined as the first object tag, and the object tag of the second sample object data is determined as the second object tag.

S2042 obtains the characteristics of objects information and the second sample object data of the first sample object data The first image similarity between characteristics of objects information；

Specifically, data processing equipment obtain the first sample object data characteristics of objects information and second sample The first image similarity between the characteristics of objects information of this object data, it is to be understood that data processing equipment obtains institute The characteristics of objects information of first sample object data and the characteristics of objects information of the second sample object data are stated, calculates first The first image similarity between sample object data and the second sample object data, calculation formula are as follows:

Wherein, S indicates the first image similarity between two sample object data, f_A=[a₀,a₁,…,a_N], f_B= [b₀,b₁,…,b_N] indicating the feature vectors of two sample object data, N indicates characteristic dimension, and A and B indicate two sample objects Data.

S2043, if the first image similarity be greater than first threshold, by with first object tag at least One sample object data and at least one sample object data with the second object tag, are divided into identical first object Feature cluster, and set the first object tag and the second object tag on the cluster label of the first divided characteristics of objects cluster；

Specifically, data processing equipment will have described first if the first image similarity is greater than first threshold At least one sample object data of object tag and at least one sample object data with the second object tag, are divided into Identical first characteristics of objects cluster, and the first divided characteristics of objects is set by the first object tag and the second object tag The cluster label of cluster, it is to be understood that if the first image similarity is greater than first threshold, the first threshold is to set in advance It sets, data processing equipment then by least one sample object data with first object tag and has the second object At least one sample object data of label, be divided into identical first characteristics of objects cluster, described to have the first object tag At least one sample object data can be at least one sample in a sample object data or a characteristics of objects cluster It is at least one sample object data with identical pursuit path information in object data or video data, it is described to have At least one sample object data of second object tag can be a sample object data or a characteristics of objects cluster In be at least one sample object with identical pursuit path information at least one sample object data or video data Data, and set the first object tag and the second object tag on the cluster label of the first divided characteristics of objects cluster, it is described The cluster label of first characteristics of objects cluster can be one in the first object tag and the second object tag, or set at random It sets, please also refer to Fig. 5, for the embodiment of the invention provides a kind of example schematics that label is arranged.As shown in figure 5, sample Object data 1 and the first image similarity of sample object data 2 are greater than first threshold, then sample object data 1 and sample pair The cluster of image data 2 generates the first characteristics of objects cluster, and the object tag of the sample object data 1 is A, the sample object data 2 object tag is B, if being arranged by the cluster label that sample object data 1 and sample object data 2 generate the first characteristics of objects cluster For the object tag A of sample object data 1, then the object tag of sample object data 1 and sample object data 2 is disposed as A, If being set as sample object data 2 by the cluster label that sample object data 1 and sample object data 2 generate the first characteristics of objects cluster Object tag B, then the object tag of sample object data 1 and sample object data 2 is disposed as B, if by sample object number It is randomly set to C according to the cluster label that 1 and sample object data 2 generate the first characteristics of objects cluster, then sample object data 1 and sample The object tag of object data 2 is disposed as C.

S2044, when all sample object data are divided into the first affiliated characteristics of objects cluster, and by each first Sample object data in characteristics of objects cluster and the characteristics of objects cluster are stored；

Specifically, data processing equipment is divided into the first affiliated characteristics of objects cluster when all sample object data When, and the sample object data in each first characteristics of objects cluster and the characteristics of objects cluster are stored, it is possible to understand that , when all sample object data are divided into the first affiliated characteristics of objects cluster, can determine all samples pair Image data has carried out cluster operation, then by the sample object number in each first characteristics of objects cluster and the characteristics of objects cluster According to being stored, the first characteristics of objects cluster can store into the memory space of data processing equipment or store to database In.

S205 clears up foreign peoples's object data in each first characteristics of objects cluster respectively, and the first object after cleaning is special Sign cluster is determined as the second characteristics of objects cluster；

The S205 of the embodiment of the present invention may comprise steps of S2051-S2053:

S2051, response are directed to the first marking operation of the first characteristics of objects cluster, and first marking operation is signified The sample object data in the first characteristics of objects cluster shown, are determined as the corresponding standard object of the first characteristics of objects cluster Data；

Specifically, data processing equipment response is directed to the first marking operation of the first characteristics of objects cluster, by described the Sample object data in the first characteristics of objects cluster indicated by one marking operation are determined as the first characteristics of objects cluster Corresponding Standard object data, it is to be understood that data processing equipment is by preset quantity in the first characteristics of objects cluster Sample object data are shown on a display screen, obtain the first label behaviour of the sample object data for display on a display screen Make, first marking operation is the click commands or touching instruction of operator or user on a display screen, will be described Sample object data indicated by first marking operation are determined as the corresponding Standard object data of the first characteristics of objects cluster, Labeled sample object data are a sample object number at least one the sample object data of display on a display screen According to, it should be noted that response is directed to the first marking operation of the first characteristics of objects cluster, and can be terminal device will be described Sample object data in the first characteristics of objects cluster indicated by first marking operation are determined as first characteristics of objects The corresponding Standard object data of cluster is also possible to terminal device for the first marking operation and is transmitted to server, and server is directed to institute The first marking operation is stated, by the sample object data in the first characteristics of objects cluster indicated by first marking operation, It is determined as the corresponding Standard object data of the first characteristics of objects cluster.

S2052, response are directed to the second marking operation of non-standard object data, will be indicated by second marking operation Non-standard object data is determined as foreign peoples's object data；The non-standard object data in the first characteristics of objects cluster remove institute State the sample object data except Standard object data；

Specifically, data processing equipment response is directed to the second marking operation of non-standard object data, described second is marked The indicated non-standard object data of note operation is determined as foreign peoples's object data；The non-standard object data is described first pair As the sample object data in feature cluster in addition to the Standard object data, it is to be understood that response is for non-standard right The data processing equipment of second marking operation of image data can be terminal device, be also possible to server, when data processing is set When for being terminal device, the Standard object data of the first characteristics of objects cluster is shown and is shown the first of display screen by terminal device Show region, shows that the non-standard object data of preset quantity, data processing equipment obtain needle in the second display area of display screen To the second marking operation for showing the non-standard object data of the second display area on a display screen.When data processing equipment is clothes Be engaged in device when, the sample object data of the first characteristics of objects cluster are sent to terminal device by server, and terminal device is by standard Object data is shown in the first display area of display screen, and the non-standard object data of preset quantity is shown the of display screen Two display areas, server obtain the non-standard object data for the second display area on a display screen that terminal device is sent The second marking operation, second marking operation be the click commands or touch of operator or user on a display screen Instruction, is determined as foreign peoples's object data, labeled sample for non-standard object data indicated by second marking operation Object data is at least one the non-standard object data of display on a display screen, is the embodiment of the present invention please also refer to Fig. 6 Provide a kind of example schematic that foreign peoples's object data is deleted.As shown in fig. 6, the display screen is for showing that the first object is special The sample object data in cluster are levied, the Standard object data of the first characteristics of objects cluster is shown in the first viewing area of display screen Domain 100, the non-standard object data of the first characteristics of objects cluster are shown in the second display area 200 of display screen, and described Two display areas 200 can show the non-standard object data of at least one the first characteristics of objects cluster, obtain aobvious for being shown in Second marking operation of the non-standard object data of the second display area in display screen, if operator or user think described One in two display areas 200 or the Standard object data in multiple non-standard object datas and the first display area 100 It is not belonging to the same first characteristics of objects cluster, then obtains operator or user and is directed to the second display area on a display screen Second marking operation of one in 200 or multiple non-standard object datas, and will be indicated by second marking operation Non-standard object data is determined as foreign peoples's object data of the first characteristics of objects cluster.

S2053 deletes foreign peoples's object data from the first characteristics of objects cluster, will delete foreign peoples's object data The first characteristics of objects cluster be determined as the second characteristics of objects cluster；

Specifically, data processing equipment deletes foreign peoples's object data from the first characteristics of objects cluster, will delete Except the first characteristics of objects cluster of foreign peoples's object data is determined as the second characteristics of objects cluster, it is to be understood that data processing equipment Foreign peoples's object data in first characteristics of objects cluster is deleted, the first characteristics of objects cluster after deletion foreign peoples's object data is determined For the second characteristics of objects cluster, the sample object data in the second characteristics of objects cluster are less than or equal to the first characteristics of objects cluster In sample object data, include at least one sample object data and the sample object number in each second characteristics of objects cluster According to corresponding image feature information.

S206, carries out similar cluster merging between multiple second characteristics of objects clusters, generates multiple with different cluster labels The object tag of the sample object data is updated to the cluster of the affiliated third characteristics of objects cluster by third characteristics of objects cluster Label；

The S206 of the embodiment of the present invention may comprise steps of S2061-S2064:

S2061 obtains the first detection feature cluster and the second detection feature cluster from multiple second characteristics of objects clusters；

Specifically, data processing equipment obtains the first detection feature cluster and the second detection from multiple second characteristics of objects clusters Feature cluster, it is to be understood that the first detection feature cluster is a characteristics of objects cluster in multiple second characteristics of objects clusters, The second detection feature cluster is to detect the different characteristics of objects of feature cluster with first in multiple second characteristics of objects clusters Cluster.

S2062, response are based on institute for the merging request of the first detection feature cluster and the second detection feature cluster State the first quantity for merging the sample object data in the first detection feature cluster described in request and the second detection feature Second quantity of the sample object data in cluster；

Specifically, conjunction of the data processing equipment response for the first detection feature cluster and the second detection feature cluster And request, based on the first quantity of sample object data in the first detection feature cluster described in the merging request and described Second quantity of the sample object data in the second detection feature cluster, it is to be understood that data processing equipment is by described first The Standard object data of detection feature cluster is shown in the first display area of display screen, by the standard of the second detection feature cluster Object data is shown in the second display area of display screen, obtains for the standard for showing the first detection feature cluster on a display screen The merging request of the Standard object data of object data and the second detection feature cluster, the merging request is operator or use The click commands or touching instruction of family on a display screen, based on the sample merged in the first detection feature cluster described in request Second quantity of the first quantity of object data and the sample object data in the second detection feature cluster, needs to illustrate It is that response can be terminal device and obtain for the merging request of the first detection feature cluster and the second detection feature cluster It takes merging to request, and executes for the merging request based on the sample pair merged in the first detection feature cluster described in request The step of second quantity of the first quantity of image data and the sample object data in the second detection feature cluster, response merges Request, is also possible to the merging request that server obtains terminal device transmission, and server is executed for merging request and is based on The first quantity and described second for merging the sample object data in the first detection feature cluster described in request detects feature cluster In sample object data the second quantity the step of, please also refer to Fig. 7, for the embodiment of the invention provides a kind of feature clusters Combined example schematic.As shown in fig. 7, the display screen is used to show the Standard object data in detection feature cluster, it is described The Standard object data of first detection feature cluster is shown in the first display area 300 of display screen, the second detection feature cluster Standard object data be shown in the second display area 400 of display screen, if operator or user are special according to the first detection The Standard object data of the Standard object data and the second detection feature cluster of levying cluster thinks the first detection feature cluster and the first detection Feature cluster can be merged with similar cluster, then the click for obtaining the merging request of the triggering of operator or user on a display screen refers to Order or touching instruction, and obtain the first quantity and second inspection of the sample object data in the first detection feature cluster Survey the second quantity of the sample object data in feature cluster.

S2063 is detected in feature cluster according to each sample object data and described second in the first detection feature cluster The image similarity of each sample object data, first quantity and second quantity obtain the first detection feature The classification similarity of cluster and the second detection feature cluster；

Specifically, data processing equipment is according to each sample object data and described second in the first detection feature cluster Image similarity, first quantity and second quantity for detecting each sample object data in feature cluster, described in acquisition The classification similarity of first detection feature cluster and the second detection feature cluster, it is to be understood that data processing equipment calculates Each sample object data and described second detect each sample object data in feature cluster in the first detection feature cluster Image similarity, according to image similarity, first quantity and second quantity, obtain the first detection feature cluster and The classification similarity of the second detection feature cluster, the classification calculating formula of similarity are as follows:

M=P₁₂/(N₁*N₂)

Wherein, M is the classification similarity of the first detection feature cluster and the second detection feature cluster, N₁,N₂Respectively indicate the first inspection Survey the quantity of feature cluster and the corresponding sample object data of the second detection feature cluster, P₁₂Indicate the first detection feature cluster and the second inspection Survey the quantity that image similarity in feature cluster is greater than the sample object data pair of similarity threshold.

S2064, if the classification similarity is greater than second threshold, by the first detection feature cluster and the second detection feature cluster Merge into third characteristics of objects cluster, and by the object of the sample object data in the first detection feature cluster and the second detection feature cluster Label is set as the cluster label of the affiliated third characteristics of objects cluster；

Specifically, if the classification similarity be greater than second threshold, data processing equipment by first detection feature cluster and Second detection feature cluster merges into third characteristics of objects cluster, and the first detection feature cluster and second are detected to the sample in feature cluster The object tag of object data is set as the cluster label of the affiliated third characteristics of objects cluster, it is to be understood that if described Classification similarity is greater than second threshold, and the first detection feature cluster and the second detection feature cluster are then merged into the by data processing equipment Three characteristics of objects clusters, the sample object data in the third characteristics of objects cluster include the sample object in the first detection feature cluster Data and second detection feature cluster in sample object data, and by first detection feature cluster and second detection feature cluster in sample The object tag of this object data is set as the cluster label of the affiliated third characteristics of objects cluster, the third characteristics of objects cluster Cluster label can be the first detection feature cluster or second detection feature cluster cluster label, the cluster label of third characteristics of objects cluster Can arbitrarily it be arranged.

The third characteristics of objects cluster is determined as recycling characteristics of objects cluster by S207；It is special that each circulation object is cleared up respectively Levy foreign peoples's object data in cluster；Circulation characteristics of objects cluster after deletion foreign peoples's object data is subjected to similar cluster merging, is obtained The third characteristics of objects cluster；

S208, whether the third characteristics of objects cluster meets the condition of convergence or similar cluster merges number and meets time Number threshold value；

When the third characteristics of objects cluster meets the condition of convergence or similar cluster merges number equal to frequency threshold value, execute Step S209；When the third characteristics of objects cluster is unsatisfactory for the condition of convergence and similar cluster merges number less than frequency threshold value, weight Step S207 is executed again.

Specifically, the third characteristics of objects cluster is determined as recycling characteristics of objects cluster by data processing equipment；It clears up respectively Foreign peoples's object data in each circulation characteristics of objects cluster；The method and the method phase in step S205 for clearing up foreign peoples's object data Together, then by the circulation characteristics of objects cluster after deletion foreign peoples's object data similar cluster merging is carried out, obtains the third characteristics of objects Cluster, the method for carrying out similar cluster merging to circulation characteristics of objects cluster are identical as the method in step S206；It should be noted that clear Reason recycles foreign peoples's object data in characteristics of objects cluster and can execute to the similar cluster merging of circulation characteristics of objects cluster progress more Secondary, during carrying out repeatedly similar cluster merging, similar cluster merging needs to judge classification similarity every time, works as progress When the classification similarity of the first detection feature cluster and the second detection feature cluster that merge is greater than second threshold, can just it carry out similar Cluster merges, and during multiple similar cluster merges, the second threshold in each merging process can be identical, can also be according to same The number of iterations that class cluster merges judges, for example, the second threshold that similar cluster merges according to the corresponding threshold value of the number of iterations is [T₀,T₁,T₂,…,T_N], wherein N indicates the number of iterations that similar cluster merges, specifically, merging when carrying out second of similar cluster When, second threshold T₂, when the classification similarity of the first detection feature cluster merged and the second detection feature cluster is greater than T₂ When, the first detection feature cluster and the second detection feature cluster are merged into third characteristics of objects cluster.When special for the third object When sign cluster meets the condition of convergence or similar cluster and merges number and meet frequency threshold value condition, cleaning circulation characteristics of objects is no longer executed Foreign peoples's object data in cluster and similar cluster merging is carried out to circulation characteristics of objects cluster, executes step S209, the convergence item Part is that foreign peoples's object data and similar cluster are not present in the third characteristics of objects cluster.

Specifically, during determining whether the third characteristics of objects cluster meets the condition of convergence, data processing equipment The second figure between each third characteristics of objects cluster Plays object data and non-standard object data can be obtained respectively As similarity, the corresponding similarity mean value of each third characteristics of objects cluster is obtained according to second image similarity With similarity variance；Similarity mean value is less than mean value threshold value and similarity variance is special greater than the third object of variance threshold values Sign cluster is determined as target object feature cluster, it is to be understood that it is special that data processing equipment calculates separately each third object The second image similarity between cluster Plays object data and non-standard object data is levied, calculation formula is as follows:

Wherein, S indicates the image similarity between two sample object data, f_A=[a₀,a₁,…,a_N], f_B=[b₀, b₁,…,b_N] indicating the feature vectors of two sample object data, N indicates characteristic dimension, and A and B indicate two sample object numbers According to, and obtain the similarity mean value and similarity variance of the second image similarity in each third characteristics of objects cluster；By similarity Mean value, which is less than mean value threshold value and similarity variance and is greater than the third characteristics of objects clusters of variance threshold values, is determined as target object spy Cluster is levied, the mean value threshold value and variance threshold values are to preset, and the target object feature cluster is in third characteristics of objects cluster The relatively large characteristics of objects cluster of difference between sample object data.

Obtain the testing result for being directed to the target object feature cluster；If the testing result is the target object feature Foreign peoples's object data and similar cluster are not present in cluster, it is determined that the third characteristics of objects cluster meets the condition of convergence；

Specifically, data processing equipment obtains the testing result for being directed to the target object feature cluster；If the detection knot Fruit is that foreign peoples's object data and similar cluster are not present in the target object feature cluster, it is determined that the third characteristics of objects cluster is full The sufficient condition of convergence, it is to be understood that data processing equipment is by the sample object number of at least one target object feature cluster On a display screen according to display, it obtains for the testing result for showing target object feature cluster on a display screen, the testing result For the click commands or touching instruction of operator or user on a display screen, if the testing result is the target pair As foreign peoples's object data and similar cluster are not present in feature cluster, it is determined that the third characteristics of objects cluster meets the condition of convergence, if The testing result is that there are foreign peoples's object data or similar clusters in the target object feature cluster, it is determined that the third object Feature cluster is unsatisfactory for the condition of convergence.

S209, object tag based on the sample object data in each third characteristics of objects cluster and described right As characteristic information, training Object identifying model；The Object identifying model object data and target object to be predicted for identification Similarity between data.

Specifically, data processing equipment obtains the characteristics of objects letter of the sample object data in the third characteristics of objects cluster Sample image similarity between breath and the characteristics of objects information of the Standard object data in initial object identification model, according to institute State the object tag of the sample object data in Object identifying model, sample object data in the third characteristics of objects cluster Object tag and the sample image similarity determine similarity error, and adjust institute according to the similarity error back propagation The model parameter for stating initial object identification model, when the adjustment number of the model parameter is equal to adjustment threshold value or described similar When degree error meets the condition of convergence, the initial object identification model comprising model parameter adjusted is determined as the object and is known Other model, the condition of convergence are that the similarity error is less than similarity error threshold.

S210 obtains multi-medium data to be predicted；It is obtained according to the object location information in the multi-medium data to be predicted The targeted object region with target size is taken, the picture material to be predicted in the targeted object region is carried out based on described Picture material to be predicted after affine transformation is determined as object data to be predicted by the affine transformation of target bearing；Based on object Identification model obtains the similarity between object data to be predicted and target object data.

Specifically, data processing equipment obtains multi-medium data to be predicted；According in the multi-medium data to be predicted Object location information obtains the targeted object region with target size, will be in the image to be predicted in the targeted object region Appearance carries out the affine transformation based on the target bearing, and the picture material to be predicted after affine transformation is determined as object to be predicted Data；The similarity between object data to be predicted and target object data is obtained based on Object identifying model, it is possible to understand that It is that the multi-medium data to be predicted is the image data or video data of not label, the object location information is more The location information of image-region where sample object in media data, sample object may include face, animal face or It is object, obtains the subject area with target size from multi-medium data to be predicted according to the object location information, it will Picture material in the subject area carries out the affine transformation based on target bearing, will be in the image to be predicted after affine transformation Appearance is determined as object data to be predicted, the characteristic information to be predicted of the object data to be predicted is obtained, using pair after training As identification model obtains the similarity between object data to be predicted and target object data, the object data to be predicted is judged It whether is the same sample object with the target object data, the target object data is stored in advance in memory space Sample object data.

In embodiments of the present invention, by obtaining sample object data respectively from multiple multi-medium datas, described in extraction The characteristics of objects information of sample object data；Multiple sample object data are clustered according to the characteristics of objects information, are obtained To multiple the first characteristics of objects clusters with different cluster labels；The object tag of all sample object data is with affiliated first The cluster label of characteristics of objects cluster is identical；Foreign peoples's object data in each first characteristics of objects cluster is cleared up respectively, after cleaning First characteristics of objects cluster is determined as the second characteristics of objects cluster；Similar cluster merging is carried out between multiple second characteristics of objects clusters, it is raw At multiple third characteristics of objects clusters with different cluster labels, by the object tag of the sample object data be updated to belonging to The cluster label of the third characteristics of objects cluster；Pair based on the sample object data in each third characteristics of objects cluster As label and the characteristics of objects information, training Object identifying model；Object identifying model object to be predicted for identification Similarity between data and target object data.By being clustered to sample object data, foreign peoples's object data remove and Similar cluster merges the sample object data for generating and having label, by largely the sample object data of label do not generate the sample for having label This object data, then using the sample object data training Object identifying model for largely having label, it can be to avoid Object identifying mould The type lower problem of the very few discrimination for leading to Object identifying model of training data in the training process, by sample object number According to multiple foreign peoples's object data remove and similar cluster merge, and sampling check, improve the standard of sample object data clusters True rate further improves Object identifying model to the discrimination of object data to be predicted.

A kind of data processing method provided in an embodiment of the present invention is illustrated below with reference to specific implement scene.Such as Shown in Fig. 8, data processing method provided in an embodiment of the present invention can be applied to do not have identity mark largely from reality scene The sample object data for having label for Object identifying model training are excavated in the sample object data of label, below not have The human face data generation of label has the human face data of label to a kind of data processing method progress provided in an embodiment of the present invention Illustrate, the invention mainly comprises data prediction, non-same people to clear up, with people's merging and four processes of quality testing, specifically, Data processing equipment obtains human face data respectively from multiple multi-medium datas, and multi-medium data includes image data and video counts According to extracting face characteristic information from human face data, and cluster to multiple human face datas according to face characteristic information, obtain Multiple the first face characteristic clusters with different cluster labels, clear up the heterogeneous data in each first face characteristic cluster respectively, will Cleaning heterogeneous data after the first face characteristic cluster be determined as the second face characteristic cluster, between multiple second face characteristic clusters into The similar cluster of row merges, and generates multiple third face characteristic clusters with different cluster labels, and more by the face label of human face data It is newly the cluster label of the affiliated third face characteristic cluster, using the face of the human face data in third face characteristic cluster Label and the face characteristic information, training human face recognition model, human face recognition model can be used for identifying face number to be predicted According to the similarity between target human face data.

Wherein, data prediction process are as follows:

By the local server for being stored in space abundance without label human face data in multi-medium data, then carry out original Video cuts the pretreatment such as frame, image format conversion, detects all faces in multi-medium data, indicates face position with rectangle frame It sets, obtains the subject area with target size, the target respectively from each multi-medium data according to the face location Size is preset fixed dimension, for example, can be cut out 1 times and 2 times of sizes according to pre-specified training image size Image corresponding face label is arranged according to the type of multi-medium data, if the multi-medium data be image data, Face label is randomly provided to the human face data, different sample object data correspond to different face labels, if described more Media data is video data, then detects pursuit path information of the sample object data in the video data, will have There is the sample object data setting relating attribute of the identical pursuit path information, the relating attribute is a kind of mark information, And for the human face data with the relating attribute, identical face label is set, and according to human face posture, image fog-level, The indexs such as ambient lighting and coverage extent, the relatively high facial image of filtering mass, by filtered facial image and Corresponding face characteristic information is uploaded to distal memory cell, using human face recognition model, extracts the face characteristic letter of facial image Breath, clusters multiple face image datas according to face characteristic information, obtains multiple the first with different cluster labels Face feature cluster.

Wherein, non-same people's clean-up process are as follows:

Remove the heterogeneous data in the first face characteristic cluster, heterogeneous data be not belonging in the first face characteristic cluster it is the first The facial image of face feature cluster, specific method be the identical facial image of cluster labels is shown on mobile terminal display screen, and Finally it is confirmed whether it is the same person with professional mark personnel.Data processing equipment responds professional mark personnel and is directed to first Facial image indicated by first marking operation is determined as described the first by the first marking operation of face characteristic cluster The corresponding standard faces image of face feature cluster, response are directed to the second marking operation of non-standard facial image, described second are marked The indicated non-standard object data of note operation is determined as heterogeneous data, and foreign peoples's object data is special from first face It is deleted in sign cluster, the first face characteristic cluster for deleting heterogeneous data is determined as the second face characteristic cluster.

If the removing heterogeneous data initiated for the first time, server is directly by cluster result information update to database；If It is this process of iteration, server is inquired first with the presence or absence of new cluster result information (according to time inquiring), if it does, clothes Being engaged in device will be according to new clustering information more new database corresponding field state, and completion database positioning pushes image to movement after updating The display screen of terminal.

Wherein, merge process with people are as follows:

Merge the similar cluster in the second face characteristic cluster, specific method is at display screen display two of mobile terminal The standard faces image of two face characteristic clusters, and finally it is confirmed whether it is the same person with professional mark personnel, at data Reason equipment obtains the first detection feature cluster and the second detection feature cluster from multiple second face characteristic clusters, responds profession mark people Member is based on merging request institute for the merging request of the first detection feature cluster and the second detection feature cluster State the facial image in the first detection feature cluster the first quantity and it is described second detection feature cluster in facial image second Quantity, according to each facial image in each facial image in the first detection feature cluster and the second detection feature cluster Image similarity, the first quantity and the second quantity obtain the class of the first detection feature cluster and the second detection feature cluster Other similarity merges the first detection feature cluster and the second detection feature cluster if the classification similarity is greater than second threshold For third face characteristic cluster.

If first initiate to merge similar cluster, server is directly by cluster result information update to database；If it is repeatedly For this process, server-side is inquired first with the presence or absence of new cluster result information, if it does, server-side will be according to new cluster letter The more new database corresponding field state of breath completes the display screen that image to mobile terminal is pushed after database positioning updates.

After the same people for completing all categories pair merges, amalgamation result is updated to database by server, and activates quality Testing process.

Wherein, quality inspection process are as follows:

Quality inspection process includes non-same people and examines two options with hostage, and non-same hostage's inspection is detection third face characteristic cluster In whether there is heterogeneous data, be in the multiple third face characteristic clusters of detection with the presence or absence of annexable similar cluster with hostage's inspection, Data processing equipment is obtained respectively between each third face characteristic cluster Plays facial image and non-standard facial image Image similarity, the corresponding similarity mean value of each third face characteristic cluster and phase are obtained according to image similarity Like degree variance, similarity mean value is less than mean value threshold value and similarity variance is greater than the third face characteristic cluster of variance threshold values It is determined as needing to carry out the target face characteristic cluster of quality inspection, the screening technique of objectives face characteristic cluster is according to mean value to Three face characteristic clusters carry out ascending order arrangement, or carry out descending arrangement to third face characteristic cluster according to variance, before sequence 10% third face characteristic cluster is determined as target face characteristic cluster, then obtains the detection for being directed to the target face characteristic cluster As a result, if the testing result is that foreign peoples's object data and similar cluster are not present in the target object feature cluster, based on every The face label and face characteristic information of facial image in a third face characteristic cluster, training human face recognition model, if described Testing result is that there are foreign peoples's object data or similar clusters in the target object feature cluster, then carry out again non-same people cleaning, Merge with people.

Below in conjunction with attached drawing 9- attached drawing 10, describe in detail to data processing equipment provided in an embodiment of the present invention. It should be noted that the attached equipment shown in Fig. 10 of attached drawing 9-, the method for executing Fig. 1-embodiment illustrated in fig. 8 of the present invention, are Convenient for explanation, only parts related to embodiments of the present invention are shown, disclosed by specific technical details, please refers to the present invention 1- embodiment shown in Fig. 8.

Fig. 9 is referred to, for the embodiment of the invention provides a kind of structural schematic diagrams of data processing equipment.As shown in figure 9, The data processing equipment 1 of the embodiment of the present invention may include: characteristic acquisition unit 11, fisrt feature cluster acquiring unit 12, second feature cluster acquiring unit 13, third feature cluster acquiring unit 14, model training unit 15, object data recognition unit 16, feature cluster recycles acquiring unit 17, detection unit 18, testing result acquiring unit 19.

Characteristic acquisition unit 11 extracts institute for obtaining sample object data respectively from multiple multi-medium datas State the characteristics of objects information of sample object data；

Fisrt feature cluster acquiring unit 12, for being gathered according to the characteristics of objects information to multiple sample object data Class obtains multiple the first characteristics of objects clusters with different cluster labels；The object tag of all sample object data with it is affiliated The first characteristics of objects cluster cluster label it is identical；

Second feature cluster acquiring unit 13, foreign peoples's object data for being cleared up in each first characteristics of objects cluster respectively, The first characteristics of objects cluster after cleaning is determined as the second characteristics of objects cluster；

Third feature cluster acquiring unit 14 is generated for carrying out similar cluster merging between multiple second characteristics of objects clusters The object tag of the sample object data is updated to affiliated institute by multiple third characteristics of objects clusters with different cluster labels State the cluster label of third characteristics of objects cluster；

Model training unit 15, for pair based on the sample object data in each third characteristics of objects cluster As label and the characteristics of objects information, training Object identifying model；Object identifying model object to be predicted for identification Similarity between data and target object data.

Fig. 9 is referred to, the characteristic acquisition unit 11 of the embodiment of the present invention may include: that location information obtains son Unit 111, object data obtain subelement 112, subelement 113 is arranged in object tag.

Location information obtains subelement 111, for obtaining the object location information in multiple multi-medium datas, according to described Object location information obtains the subject area with target size respectively from each multi-medium data；

Object data obtains subelement 112, for the picture material in the subject area to be determined as sample object number According to the characteristics of objects information of acquisition each sample object data；

Subelement 113 is arranged in object tag, if being image data for the multi-medium data, for described image data In the sample object data be randomly provided object tag；If the multi-medium data is video data, the sample is detected Pursuit path information of this object data in the video data, by the sample object with the identical pursuit path information Relating attribute is arranged in data, and identical object tag is arranged for the sample object data with the relating attribute；

The fisrt feature cluster acquiring unit 12 of the embodiment of the present invention is specifically used for:

It, will be described for obtaining first sample object data and the second sample object data from multiple sample object data The object tag of first sample object data is determined as the first object tag, by the object tag of the second sample object data It is determined as the second object tag；First object tag and second object tag be not identical；

For obtaining the characteristics of objects information of the first sample object data and pair of the second sample object data As the first image similarity between characteristic information；

If being greater than first threshold for the first image similarity, by least one with first object tag It is special to be divided into identical first object for a sample object data and at least one sample object data with the second object tag Cluster is levied, and sets the first object tag and the second object tag on the cluster label of the first divided characteristics of objects cluster；

For when all sample object data are divided into the first affiliated characteristics of objects cluster, and by each first pair As the sample object data in feature cluster and the characteristics of objects cluster are stored；

The second feature cluster acquiring unit 13 of the embodiment of the present invention is specifically used for:

It, will be indicated by first marking operation for responding the first marking operation for being directed to the first characteristics of objects cluster The first characteristics of objects cluster in sample object data, be determined as the corresponding standard object number of the first characteristics of objects cluster According to；

It, will be non-indicated by second marking operation for responding the second marking operation for being directed to non-standard object data Standard object data is determined as foreign peoples's object data；The non-standard object data is in the first characteristics of objects cluster except described Sample object data except Standard object data；

For deleting foreign peoples's object data from the first characteristics of objects cluster, foreign peoples's object data will be deleted First characteristics of objects cluster is determined as the second characteristics of objects cluster；

The third feature cluster acquiring unit 14 of the embodiment of the present invention is specifically used for:

For obtaining the first detection feature cluster and the second detection feature cluster from multiple second characteristics of objects clusters；

For responding the merging request for being directed to the first detection feature cluster and the second detection feature cluster, based on described The first quantity and described second for merging the sample object data in the first detection feature cluster described in request detects feature cluster In sample object data the second quantity；

For according to every in each sample object data in the first detection feature cluster and the second detection feature cluster The image similarity of a sample object data, first quantity and second quantity obtain the first detection feature cluster With the classification similarity of the second detection feature cluster；

If being greater than second threshold for the classification similarity, the first detection feature cluster and the second detection feature cluster are closed And be third characteristics of objects cluster, and by first detection feature cluster and second detection feature cluster in sample object data object mark Label are set as the cluster label of the affiliated third characteristics of objects cluster；

Feature cluster recycles acquiring unit 17, recycles characteristics of objects cluster for the third characteristics of objects cluster to be determined as；Point Foreign peoples's object data in each circulation characteristics of objects cluster is not cleared up；By the circulation characteristics of objects cluster after deletion foreign peoples's object data Similar cluster merging is carried out, the third characteristics of objects cluster is obtained；Meet the condition of convergence when being directed to the third characteristics of objects cluster, or When similar cluster merges number equal to frequency threshold value, object data recognition unit 16 is triggered；

Detection unit 18, for obtain respectively each third characteristics of objects cluster Plays object data with it is non-standard right The second image similarity between image data obtains each third characteristics of objects cluster point according to second image similarity Not corresponding similarity mean value and similarity variance；Similarity mean value is less than mean value threshold value and similarity variance is greater than variance threshold The third characteristics of objects cluster of value is determined as target object feature cluster；

Testing result acquiring unit 19, for obtaining the testing result for being directed to the target object feature cluster；If the inspection Surveying result is that foreign peoples's object data and similar cluster are not present in the target object feature cluster, it is determined that the third characteristics of objects Cluster meets the condition of convergence；

Object data recognition unit 16, for obtaining multi-medium data to be predicted；According to the multi-medium data to be predicted In object location information obtain have target size targeted object region, by the figure to be predicted in the targeted object region As content carries out the affine transformation based on the target bearing, the picture material to be predicted after affine transformation is determined as to be predicted Object data；The similarity between object data to be predicted and target object data is obtained based on Object identifying model.

The embodiment of the invention also provides a kind of computer storage medium, the computer storage medium can store more Item instruction, described instruction are suitable for being loaded by processor and being executed the method and step such as above-mentioned Fig. 1-embodiment illustrated in fig. 8, specifically hold Row process may refer to Fig. 1-embodiment illustrated in fig. 8 and illustrate, herein without repeating.

Referring to Figure 10, for the embodiment of the invention provides a kind of structural schematic diagrams of data processing equipment.Such as Figure 10 institute Show, the data processing equipment 1000 may include: at least one processor 1001, such as CPU, at least one network interface 1004, user interface 1003, memory 1005, at least one communication bus 1002.Wherein, communication bus 1002 is for realizing this Connection communication between a little components.Wherein, user interface 1003 may include display screen (Display), optional user interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 optionally may include that the wired of standard connects Mouth, wireless interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory, be also possible to non-labile storage Device (non-volatile memory), for example, at least a magnetic disk storage.Memory 1005 optionally can also be at least one A storage device for being located remotely from aforementioned processor 1001.As shown in Figure 10, the memory as a kind of computer storage medium It may include operating system, network communication module, Subscriber Interface Module SIM and data process application in 1005.

In equipment 1000 shown in Fig. 10, user interface 1003 is mainly used for providing the interface of input for user, obtains The data of user's input；And processor 1001 can be used for calling the data process application stored in memory 1005, and It is specific to execute following operation:

It should be appreciated that embodiment corresponding to executable Fig. 3 to the Fig. 9 above of equipment 1000 described in the embodiment of the present invention In description to the data processing method, also can be performed in embodiment corresponding to Figure 10 above to the data processing equipment 1 Description, details are not described herein.In addition, being described to using the beneficial effect of same procedure, also no longer repeated.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..

The above disclosure is only the preferred embodiments of the present invention, cannot limit the right model of the present invention with this certainly It encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.

Claims

1. a kind of data processing method characterized by comprising

It obtains sample object data respectively from multiple multi-medium datas, extracts the characteristics of objects letter of the sample object data Breath；

Multiple sample object data are clustered according to the characteristics of objects information, obtain multiple the with different cluster labels An object feature cluster；The object tag of all sample object data is identical as the affiliated cluster label of the first characteristics of objects cluster；

Foreign peoples's object data in each first characteristics of objects cluster is cleared up respectively, and the first characteristics of objects cluster after cleaning is determined as Second characteristics of objects cluster；

Similar cluster merging is carried out between multiple second characteristics of objects clusters, and it is special to generate multiple third objects with different cluster labels Cluster is levied, the object tag of the sample object data is updated to the cluster label of the affiliated third characteristics of objects cluster；

Object tag and characteristics of objects letter based on the sample object data in each third characteristics of objects cluster Breath, training Object identifying model；The Object identifying model is for identification between object data to be predicted and target object data Similarity.

2. the method according to claim 1, wherein described obtain sample pair from multiple multi-medium datas respectively Image data extracts the corresponding characteristics of objects information of the sample object data, comprising:

The object location information in multiple multi-medium datas is obtained, according to the object location information from each multi-medium data The subject area with target size is obtained respectively, and the picture material in the subject area is determined as sample object data, Obtain the characteristics of objects information of each sample object data；

If the multi-medium data is image data, it is randomly provided pair for the sample object data in described image data As label；

If the multi-medium data is video data, tracking rail of the sample object data in the video data is detected Relating attribute will be arranged with the sample object data of the identical pursuit path information in mark information, and for the association Identical object tag is arranged in the sample object data of attribute.

3. according to the method described in claim 2, it is characterized in that, the picture material by the subject area is determined as Sample object data, comprising:

Picture material in the subject area is subjected to the affine transformation based on target bearing, it will be in the image after affine transformation Appearance is determined as sample object data；Object in the sample object data is in the target bearing；

Then the method also includes:

Obtain multi-medium data to be predicted；

The targeted object region with target size is obtained according to the object location information in the multi-medium data to be predicted, it will Picture material to be predicted in the targeted object region carries out the affine transformation based on the target bearing, after affine transformation Picture material to be predicted be determined as object data to be predicted；

4. the method according to claim 1, wherein it is described according to the characteristics of objects information to multiple samples pair Image data is clustered, and multiple the first characteristics of objects clusters with different cluster labels are obtained, comprising:

First sample object data and the second sample object data are obtained from multiple sample object data, by the first sample The object tag of object data is determined as the first object tag, and the object tag of the second sample object data is determined as Two object tags；First object tag and second object tag be not identical；

Obtain the characteristics of objects information of the first sample object data and the characteristics of objects letter of the second sample object data The first image similarity between breath；

If the first image similarity is greater than first threshold, by least one sample pair with first object tag Image data and at least one sample object data with the second object tag, are divided into identical first characteristics of objects cluster, and Set the first object tag and the second object tag on the cluster label of the first divided characteristics of objects cluster；

When all sample object data are divided into the first affiliated characteristics of objects cluster, and by each first characteristics of objects cluster And the sample object data in the characteristics of objects cluster are stored.

5. the method according to claim 1, wherein it is described clear up respectively it is different in each first characteristics of objects cluster The first characteristics of objects cluster after cleaning is determined as the second characteristics of objects cluster by class object data, comprising:

Response is directed to the first marking operation of the first characteristics of objects cluster, by indicated by first marking operation described the Sample object data in an object feature cluster are determined as the corresponding Standard object data of the first characteristics of objects cluster；

Response is directed to the second marking operation of non-standard object data, by non-standard object indicated by second marking operation Data are determined as foreign peoples's object data；The non-standard object data is that the standard object is removed in the first characteristics of objects cluster Sample object data except data；

Foreign peoples's object data is deleted from the first characteristics of objects cluster, the first object of foreign peoples's object data will be deleted Feature cluster is determined as the second characteristics of objects cluster.

6. the method according to claim 1, wherein it is described carried out between multiple second characteristics of objects clusters it is similar Cluster merges, and generates multiple third characteristics of objects clusters with different cluster labels, more by the object tag of the sample object data It is newly the cluster label of the affiliated third characteristics of objects cluster, comprising:

Response is requested for the merging request of the first detection feature cluster and the second detection feature cluster based on the merging Obtain the sample in the first quantity and the second detection feature cluster of the sample object data in the first detection feature cluster Second quantity of object data；

According to each sample pair in each sample object data in the first detection feature cluster and the second detection feature cluster The image similarity of image data, first quantity and second quantity obtain the first detection feature cluster and described the The classification similarity of two detection feature clusters；

If the classification similarity is greater than second threshold, the first detection feature cluster and the second detection feature cluster are merged into third Characteristics of objects cluster, and set the object tag of the sample object data in the first detection feature cluster and the second detection feature cluster to The cluster label of the affiliated third characteristics of objects cluster.

7. the method according to claim 1, wherein it is described carried out between multiple second characteristics of objects clusters it is similar Cluster merges, and generates multiple third characteristics of objects clusters with different cluster labels, more by the object tag of the sample object data Newly for the affiliated third characteristics of objects cluster cluster label after, further includes:

Circulation characteristics of objects cluster after deletion foreign peoples's object data is subjected to similar cluster merging, obtains the third characteristics of objects Cluster；

When meeting the condition of convergence or similar cluster merging number equal to frequency threshold value for the third characteristics of objects cluster, execute The object tag and characteristics of objects information based on the sample object data in each third characteristics of objects cluster, training pair As the step of identification model.

8. the method according to the description of claim 7 is characterized in that further include:

The second figure between each third characteristics of objects cluster Plays object data and non-standard object data is obtained respectively As similarity, the corresponding similarity mean value of each third characteristics of objects cluster is obtained according to second image similarity With similarity variance；

Similarity mean value is less than mean value threshold value and similarity variance is determining greater than the third characteristics of objects cluster of variance threshold values For target object feature cluster；

If the testing result is that foreign peoples's object data and similar cluster are not present in the target object feature cluster, it is determined that described Third characteristics of objects cluster meets the condition of convergence.

9. the method according to claim 1, wherein the sample based in each third characteristics of objects cluster The object tag and characteristics of objects information of this object data, training Object identifying model, comprising:

It obtains in the characteristics of objects information and initial object identification model of the sample object data in the third characteristics of objects cluster Standard object data characteristics of objects information between sample image similarity；

According to the sample in the object tag of the sample object data in the Object identifying model, the third characteristics of objects cluster The object tag of object data and the sample image similarity determine similarity error, and reversed according to the similarity error Propagate the model parameter for adjusting the initial object identification model；

When the adjustment number of the model parameter is equal to adjustment threshold value or the similarity error meets the condition of convergence, will wrap Initial object identification model containing model parameter adjusted is determined as the Object identifying model.

10. a kind of data processing equipment characterized by comprising

Characteristic acquisition unit extracts the sample for obtaining sample object data respectively from multiple multi-medium datas The characteristics of objects information of object data；

Fisrt feature cluster acquiring unit is obtained for being clustered according to the characteristics of objects information to multiple sample object data To multiple the first characteristics of objects clusters with different cluster labels；The object tag of all sample object data is with affiliated first The cluster label of characteristics of objects cluster is identical；

Second feature cluster acquiring unit, foreign peoples's object data for being cleared up in each first characteristics of objects cluster respectively, will clear up The first characteristics of objects cluster afterwards is determined as the second characteristics of objects cluster；

Third feature cluster acquiring unit generates multiple tools for carrying out similar cluster merging between multiple second characteristics of objects clusters There is the third characteristics of objects cluster of different cluster labels, the object tag of the sample object data is updated to the affiliated third The cluster label of characteristics of objects cluster；

Model training unit, for the object tag based on the sample object data in each third characteristics of objects cluster With the characteristics of objects information, training Object identifying model；The Object identifying model for identification object data to be predicted with Similarity between target object data.

11. a kind of data processing equipment, which is characterized in that it is characterised by comprising: processor and memory；Wherein, described to deposit Reservoir is stored with computer program, and the computer program is suitable for being loaded by the processor and being executed claim 1-9 such as and appoints Method described in one.

12. a kind of computer storage medium, which is characterized in that the computer storage medium is stored with a plurality of instruction, the finger It enables and is suitable for being loaded by processor and being executed the method and step such as claim 1-9 any one.