WO2022134881A1

WO2022134881A1 - Data processing method, data processing apparatus, computer device, and non-transitory storage medium

Info

Publication number: WO2022134881A1
Application number: PCT/CN2021/128663
Authority: WO
Inventors: Kaihang YANG; Dening DI
Original assignee: Zhejiang Dahua Technology Co., Ltd.
Priority date: 2020-12-25
Filing date: 2021-11-04
Publication date: 2022-06-30
Also published as: CN112668632A; CN112668632B

Abstract

A data processing method and a data processing apparatus, a computer device, and a non-transitory storage medium are provided, for solving the problem concerning relatively low data processing efficiency. The method includes: determining a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition, and respective second standard feature sub-vector sequences whose similarities to respective pre-stored reference feature vectors satisfy the first preset similarity condition, according to respective pre-stored standard feature sub-vectors; and determining a sequence similarity between the first standard feature sub-vector sequence and each of the second standard feature sub-vectors sequences, respectively, to obtain a feature vector similarity between the target feature vector and each of the reference feature vectors.

Description

DATA PROCESSING METHOD, DATA PROCESSING APPARATUS, COMPUTER DEVICE, AND NON-TRANSITORY STORAGE MEDIUM

TECHNICAL FIELD

The present disclosure relates to the technical field of computers, and particularly to a data processing method, a data processing apparatus, a computer device, and a non-transitory storage medium.

BACKGROUND

With the continuous development of science and technology, equipment may take the place of manual labor to process tasks with a larger amount of data. For example, in the field of image processing, the equipment may determine candidate images similar to a target image by comparing feature vectors of the target image with feature vectors of each of the candidate images.

However, with the continuous improvement of an image definition, in order to accurately describe images and accurately determine candidate images similar to a target image, dimensions of feature vectors of images extracted by the equipment are also getting higher and higher, so that in the process of comparing the feature vectors of the target image with that of each of the candidate images, an amount of data that the equipment needs to process is relatively large and an efficiency that the equipment processes the data is relatively low. Similar problems also exist in other fields.

SUMMARY OF THE DISCLOSURE

The present disclosure provides a data processing method, a data processing apparatus, a computer device, and a non-transitory storage medium, which may solve the problem concerning relatively low data processing efficiency.

According to a first aspect, a data processing method is provided and includes: determining a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition, based on each of pre-stored standard feature sub-vectors; determining a second standard feature sub-vector sequence whose similarity to a reference feature vector satisfies the first preset similarity condition, based on each of the pre-stored standard feature sub-vectors; the first standard feature sub-vector sequence includes at least one first standard feature sub-vector in each of the standard feature sub-vectors; the reference feature vector is a reference feature vector in each of pre-stored reference feature vectors, and the second standard feature sub-vector sequence includes at least one second standard feature sub-vector in each of the standard feature sub-vectors; and

determining a sequence similarity between the first standard feature sub-vector sequence and each of the second standard feature sub-vectors sequences; and obtaining a feature vector similarity between the target feature vector and each of the reference feature vectors.

In an embodiment, before the determining a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition, based on each of pre-stored standard feature sub-vectors, the method further includes: dividing each of the reference feature vectors into a plurality of reference feature sub-vectors with the same number; obtaining a reference feature sub-vector sequence corresponding to each of the reference feature vectors, each of the reference feature sub-vectors in the reference feature sub-vector sequence is arranged according to a position of each of the reference feature sub-vectors in a corresponding reference feature vector; and

determining at least one corresponding standard feature sub-vector based on the reference feature sub-vectors at the same position of each of the reference feature sub-vector sequences, and obtaining each of the pre-stored standard feature sub-vectors.

In an embodiment, the determining at least one corresponding standard feature sub-vector based on the reference feature sub-vectors at the same position of each of the reference feature sub-vector sequences, and obtaining each of the pre-stored standard feature sub-vectors, includes: taking the reference feature sub-vectors at the same position of each of the reference feature sub-vector sequences as a sub-vector data set; performing a cluster processing on each of the sub-vector data sets, to obtain at least one standard feature sub-vector corresponding to each of the sub-vector data sets; and

obtaining each of the pre-stored standard feature sub-vectors, based on the at least one standard feature sub-vector corresponding to each of the sub-vector data sets.

In an embodiment, when the sub-vector data set is associated with positions of the reference feature sub-vectors included by the sub-vector data set in corresponding reference feature vectors, the determining a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition, based on each of pre-stored standard feature sub-vectors, includes: dividing the target feature vector into a plurality of target feature sub-vectors to obtain a target feature sub-vector sequence of the target feature vector; each of the target feature sub-vectors in the target feature sub-vector sequence is arranged according to a position of each of the target feature sub-vectors in the target feature vector;

determining the first standard feature sub-vector whose similarity to one of the target feature sub-vectors satisfies a second preset similarity condition in the at least one standard feature sub-vector corresponding to the sub-vector data set; a position of the target feature sub-vector in the target feature sub-vector sequence is identical to an associated position of the sub-vector data set; and

when each of the target feature sub-vectors is provided with one corresponding first standard feature sub-vector, determining that a similarity between the first standard feature sub-vector sequence including each of the first standard feature sub-vectors and the target feature vector satisfies the first preset similarity condition, and obtaining the first standard feature sub-vector sequence corresponding to the target feature vector.

In an embodiment, when each of the pre-stored standard feature sub-vectors has a vector identifier and the vector identifier is configured to uniquely represent each of the standard feature sub-vectors, the obtaining the first standard feature sub-vector sequence corresponding to the target feature vector, includes: obtaining the first standard feature sub-vector sequence corresponding to the target feature vector based on the vector identifier of each of the first standard feature sub-vectors.

In an embodiment, when the sub-vector data set is associated with positions of the reference feature sub-vectors included by the sub-vector data set in corresponding reference feature vectors, the determining each of second standard feature sub-vector sequences whose similarity to each of the pre-stored reference feature vectors satisfies the first preset similarity condition, based on each of the pre-stored standard feature sub-vectors, includes: determining the second standard feature sub-vector whose similarity to one of the reference feature sub-vectors satisfies a second preset similarity condition in the at least one standard feature sub-vector corresponding to the sub-vector data set; a position of the reference feature sub-vector in the reference feature sub-vector sequence is identical to an associated position of the sub-vector data set; and

when each of the reference feature sub-vectors is provided with one corresponding second standard feature sub-vector, determining that a similarity between the second standard feature sub-vector sequence including each of the second standard feature sub-vectors and the reference feature vector satisfies the first preset similarity condition, and obtaining the second standard feature sub-vector sequence corresponding to the reference feature vector.

In an embodiment, when the number of the first standard feature sub-vectors in the first standard feature sub-vector sequence is same with the number of the second standard feature sub-vectors in the second standard feature sub-vector sequence, the determining a sequence similarity between the first standard feature sub-vector sequence and each of second standard feature sub-vector sequences, includes: determining a sub-vector similarity between each of the first standard feature sub-vectors in the first standard feature sub-vector sequence and a second standard feature sub-vector at a corresponding position of the second standard feature sub-vector sequence; performing a weighted summation processing on obtained sub-vector similarities; and

obtaining the sequence similarity between the first standard feature sub-vector sequence and the second standard feature sub-vector sequence.

In an embodiment, after the obtaining a feature vector similarity between the target feature vector and each of the reference feature vectors, further includes: performing a sort processing on each of the reference feature vectors based on the feature vector similarity from large to small; and

outputting, from each of the reference feature vectors, the reference feature vectors before a preset ranking.

According to a second aspect, a data processing apparatus is provided and includes: a first processing module, configured to determine a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition, based on each of pre-stored standard feature sub-vectors; determine a second standard feature sub-vector sequence whose similarity to a reference feature vector satisfies the first preset similarity condition, based on each of the pre-stored standard feature sub-vectors; the first standard feature sub-vector sequence includes at least one first standard feature sub-vector in each of the standard feature sub-vectors; the reference feature vector is a reference feature vector in each of pre-stored reference feature vectors, and the second standard feature sub-vector sequence includes at least one second standard feature sub-vector in each of the standard feature sub-vectors; and

a second processing module, configured to determine a sequence similarity between the first standard feature sub-vector sequence and each of second standard feature sub-vector sequences, and obtain a feature vector similarity between the target feature vector and each of the reference feature vectors.

In an embodiment, the first processing module is further configured to, before the determining a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition, based on each of pre-stored standard feature sub-vectors, divide each of the reference feature vectors into a plurality of reference feature sub-vectors with the same number; obtain a reference feature sub-vector sequence corresponding to each of the reference feature vectors, each of the reference feature sub-vectors in the reference feature sub-vector sequence is arranged according to a position of each of the reference feature sub-vectors in a corresponding reference feature vector; and

determine at least one corresponding standard feature sub-vector based on the reference feature sub-vectors at the same position of each of the reference feature sub-vector sequences, and obtaining each of the pre-stored standard feature sub-vectors.

In an embodiment, the first processing module is specifically configured to take the reference feature sub-vectors at the same position of each of the reference feature sub-vector sequences as a sub-vector data set; perform a cluster processing on each of sub-vector data sets, to obtain at least one standard feature sub-vector corresponding to each of the sub-vector data sets; and

obtain each of the pre-stored standard feature sub-vectors, based on the at least one standard feature sub-vector corresponding to each of the sub-vector data sets.

In an embodiment, when the sub-vector data set is associated with positions of the reference feature sub-vectors included by the sub-vector data set in corresponding reference feature vectors, the first processing module is specifically configured to divide the target feature vector into a plurality of target feature sub-vectors to obtain a target feature sub-vector sequence of the target feature vector; each of the target feature sub-vectors in the target feature sub-vector sequence is arranged according to a position of each of the target feature sub-vectors in the target feature vector; determine the first standard feature sub-vector whose similarity to one of the target feature sub-vectors satisfies a second preset similarity condition in the at least one standard feature sub-vector corresponding to the sub-vector data set; a position of the target feature sub-vector in the target feature sub-vector sequence is identical to an associated position of the sub-vector data set; and

when each of the target feature sub-vectors is provided with one corresponding first standard feature sub-vector, determine that a similarity between the first standard feature sub-vector sequence including each of the first standard feature sub-vectors and the target feature vector satisfies the first preset similarity condition, and obtain the first standard feature sub-vector sequence corresponding to the target feature vector.

In an embodiment, when each of the pre-stored standard feature sub-vectors has a vector identifier and the vector identifier is configured to uniquely represent each of the standard feature sub-vectors, the first processing module is specifically configured to obtain the first standard feature sub-vector sequence corresponding to the target feature vector based on the vector identifier of each of the first standard feature sub-vectors.

In an embodiment, when a sub-vector data set is associated with a position of reference feature sub-vectors included by the sub-vector data set in corresponding reference feature vectors, the first processing module is specifically configured to determine the second standard feature sub-vector whose similarity to one of the reference feature sub-vectors satisfies a second preset similarity condition in the at least one standard feature sub-vector corresponding to the sub-vector data set; a position of the reference feature sub-vector in the reference feature sub-vector sequence is identical to an associated position of the sub-vector data set; and

when each of the reference feature sub-vectors is provided with one corresponding second standard feature sub-vector, determine that a similarity between the second standard feature sub-vector sequence including each of the second standard feature sub-vectors and the reference feature vector satisfies the first preset similarity condition, and obtain the second standard feature sub-vector sequence corresponding to the reference feature vector.

In an embodiment, when the number of the first standard feature sub-vectors in the first standard feature sub-vector sequence is same with the number of the second standard feature sub-vectors in the second standard feature sub-vector sequence, the second processing module is specifically configured to determine a sub-vector similarity between each of the first standard feature sub-vectors in the first standard feature sub-vector sequence and a second standard feature sub-vector at a corresponding position of the second standard feature sub-vector sequence; perform a weighted summation processing on obtained sub-vector similarities; and

obtain the sequence similarity between the first standard feature sub-vector sequence and the second standard feature sub-vector sequence.

In an embodiment, after the obtaining a feature vector similarity between the target feature vector and each of the reference feature vectors, the second processing module is further configured to perform a sort processing on each of the reference feature vectors based on the feature vector similarity from large to small; and

output, from each of the reference feature vectors, the reference feature vectors before a preset ranking.

According to a third aspect, a computer device is provided and includes a memory, configured to store program instructions; and a processor, configured to call up the program instructions stored in the memory to implement the method of the first aspect according to the obtained program instructions.

According to a fourth aspect, a non-transitory storage medium is provided and includes: having stored thereon computer-executable instructions, which instructions are used to cause a computer to implement the method of the first aspect.

According to embodiments of the present disclosure, in the each of pre-stored standard feature sub-vectors, at least one first standard feature sub-vector whose similarity to one of the target feature vectors satisfies a first preset similarity condition is determined to obtain the first standard feature vector sequence corresponding to the target feature vector; and at least one second standard feature sub-vector whose similarity to one of the reference feature vectors satisfies the first preset similarity condition is determined to obtain the second standard feature sub-vector sequence corresponding to the reference feature vector. Accordingly, the target feature vector and each of reference feature vectors are mapped to a same standard reference system, for comparison, and a sequence similarity between the first standard feature vector sequence and each of the second standard feature vector sequences is determined according to similarities among each of the pre-stored standard feature sub-vectors, so as to obtain a feature vector similarity between the target feature vector and each of the reference feature vectors. In this way, a process of calculating the similarities between the target feature vector and each of the reference feature vectors may be simplified, and data processing efficiency may be improved.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic view of an application scenario of a data processing method according to an embodiment of the present disclosure.

FIG. 2 is a schematic flowchart of the data processing method according to an embodiment of the present disclosure.

FIG. 3 is a principle schematic view I of the data processing method according to an embodiment of the present disclosure.

FIG. 4 is a principle schematic view II of the data processing method according to an embodiment of the present disclosure.

FIG. 5 is a structural view I of the data processing device according to an embodiment of the present disclosure.

FIG. 6 is a structural view II of the data processing device according to an embodiment of the present disclosure.

DETAILED DESCRIPTION

In sort to make the objects, technical solutions, and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be described in the following clearly and comprehensively by referring to the accompanying drawings in the embodiments of the present disclosure.

In addition, in the embodiments of the present disclosure, "at least one" refers to one or more, and "a plurality of" refers to two or more. The words “and/or" , which describe association relationships of associated objects, indicate that there may be three relationships; for example, A and/or B, which may mean the cases as follows: A exists alone, A and B exist at the same time, and B exists alone, where A, B can be singular or plural. The sign "/" generally indicates that the associated objects before and after the sign are in an "or" relationship. The expression "the following at least one item" or the like refers to any combination of these items, including any combination of any combination of a single item or plural items.

With the continuous development of science and technology, equipment may take the place of manual labor to process more and more tasks with larger data volumes. For example, in the field of image processing, the equipment can determine candidate images similar to a target image by making a comparison between feature vectors of the target image with feature vectors of each of the candidate images.

When the data volume of the feature vectors of the candidate images is not large, the feature vectors of the target image may be compared with the feature vectors of each of the candidate images sequentially to determine the candidate images similar to the target image. However, when the data volume of the feature vectors of the candidate images is large, for example, in the field of security and surveillance, a candidate image database is established in a manner of one file for one person in the business of face ID photos registered by the Public Security Bureau. Face images captured by different models of Network Video Recorders (NVRs) or other electronic equipment are used as target images. Therefore, the face images not only include front face images of each user, but also may include side face images of different angles. As a result, the data volume of the feature vectors of the face images obtained based on the face images is millions, even tens of millions, or the like. When a linear manner of comparing the feature vectors of the target image and the feature vectors of each of the candidate images is followed, an efficiency with which the comparison results are obtained is so low that the linear manner cannot be applied to the fields with higher practical requirements.

Furthermore, with the continuous improvement of an image definition, in order to accurately describe images and accurately determine candidate images similar to a target image, dimensions of feature vectors of images extracted by the equipment are also getting higher and higher, so that in a process of comparing the feature vectors of the target image and the feature vectors of each of the candidate images, a volume of data that the equipment needs to process is relatively large, thereby resulting in a relatively low data processing efficiency. Similar problems also exist in other fields.

In view of the above, a data processing method is provided in an embodiment of the present disclosure to solve the problem concerning a relatively low data processing efficiency. The method may be applied to a terminal device or a network device. The terminal device can be a mobile phone, a tablet computer, a personal computer, or the like; and the network device can be a local server, a third-party server, a cloud server, or the like.

According to embodiments of the present disclosure, in the each of pre-stored standard feature sub-vectors, at least one first standard feature sub-vector whose similarity to one of the target feature vectors satisfies a first preset similarity condition is determined to obtain the first standard feature vector sequence corresponding to the target feature vector; and at least one second standard feature sub-vector whose similarity to one of the reference feature vectors satisfies the first preset similarity condition is determined to obtain the second standard feature sub-vector sequence corresponding to each of the reference feature vectors. Accordingly, the target feature vector and each of the reference feature vectors are mapped to the same standard reference system for comparison, a sequence similarity between the first standard feature vector sequence and each of the second standard feature vector sequences is determined according to similarities among each of the pre-stored standard feature sub-vectors, so as to obtain a feature vector similarity between the target feature vector and each of the reference feature vectors. In this way, a process of calculating the similarities between the target feature vector and each of the reference feature vectors may be simplified, and data processing efficiency may be improved.

Reference is made to FIG. 1. FIG. 1 a schematic view of an application scenario of a data processing method according to an embodiment of the present disclosure. The application scenario includes a storage device 101 and a processing device 102. The storage device 101 and the processing device 102 may communicate with each other. The communication manner may be a wired communication manner, such as communicating through a network cable or a serial cable; or may be a wireless communication manner, such as communicating through a technology such as Bluetooth or wireless fidelity (WIFI) , without any specific limitation set herein.

The storage device 101 generally refers to a device that can be used to store data, such as a local database of the processing device 102, a third-party database associated with the processing device 102, a database associated with the processing device 102, or the like, without any specific limitation set herein. The processing device 102 generally refers to a device that can process data, such as a terminal device, a client, a server, or the like. The client can be a web page that can be accessed by the terminal device, or a third-party program, or the like, without any specific limitation set herein.

In an embodiment, the storage device 101 and the processing device 102 may be a same device. In the embodiments of the present disclosure, the storage device 101 and the processing device 102 are introduced as different devices, respectively.

An interaction process between various devices will be briefly introduced in the following based on the application scenario in FIG. 1.

The processing device 102 is configured to acquire each of pre-stored standard feature sub-vectors from the storage device 101. The processing device 102 is configured to determine a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition, based on each of the pre-stored standard feature sub-vectors. The processing device 102 is configured to determine a second standard feature sub-vector sequence whose similarity to each of the pre-stored reference feature vectors satisfies the first preset similarity condition, based on each of the pre-stored standard feature sub-vectors. The first standard feature sub-vector sequence includes at least one first standard feature sub-vector in each of the standard feature sub-vectors, and the second standard feature sub-vector sequences includes at least one second standard feature sub-vector in each of the standard feature sub-vectors.

The processing device 102 is configured to determine a sequence similarity between the first standard feature sub-vector sequence and each of the second standard feature sub-vectors sequences to obtain a feature vector similarity between the target feature vector and each of the reference feature vectors.

Reference is made to FIG. 2. FIG. 2 is a schematic flowchart of the data processing method according to an embodiment of the present disclosure. The data processing method will be specifically introduced in the following.

In an operation S201, each of standard feature sub-vectors is acquired.

Each of the standard feature sub-vectors pre-stored in the storage device 101 may be determined based on each of the reference feature vectors by the processing device 102 after obtaining each of the reference feature vectors; alternatively, they may be determined based on each of the reference feature vectors by the processing device 102 when a processing resource occupancy rate of the processing device 102 is relatively low; alternatively, they may be determined based on each of the reference feature vectors by the processing device 102 after receiving indication information for determining each of the standard feature sub-vectors, without any specific limitation set herein. The reference feature vectors are feature vectors of reference images obtained after being subjected to feature extraction, and may be a kind of high-latitude floating-point data, with 256 dimensions or 512 dimensions, for example.

After the processing device 102 obtains each of the standard feature sub-vectors, each of the standard feature sub-vectors may be stored in the storage device 101, so that when the target feature vector needs to be determined based on each of the standard feature sub-vectors, each of the standard feature sub-vectors can be read from the storage device 101.

In an embodiment, the storage device 101 may store a correspondence relationship between a reference image and reference feature vectors corresponding to the reference image. By determining the reference feature vectors, the reference image corresponding to the reference feature vectors may accordingly be determined.

In an embodiment, the stored reference feature vectors may be acquired by performing a normalization processing on feature vectors obtained after a feature extraction is performed on the reference images. The normalization processing is, for example, a L2 norm processing, such that an impact of a magnitude of a pixel value itself on a similarity calculation may be reduced and thereby an accuracy of the similarity calculation may be improved.

In an embodiment, the stored reference feature vectors may be acquired by performing a dimension reduction processing on feature vectors obtained after a feature extraction is performed on the reference images. The dimension reduction processing, for example, may be a principal component analysis (PCA) , or singular value decomposition (SVD) , or the like, involving first seeking an orthogonal basis matrix P of feature vectors, and then obtaining a compressed matrix Y of the feature vectors after being compressed, through a manner of Y=PX, where X is the feature vectors. Based on the correlation among the features reduced, performance and effectiveness of subsequent calculation for the similarities among the feature vectors are improved.

A process of obtaining each of the standard feature sub-vectors is specifically introduced in the following.

In an operation S1.1, each of the reference feature vectors is divided into a plurality of reference feature sub-vectors with the same number, respectively; a reference feature sub-vector sequence corresponding to each of the reference feature vectors is obtained.

After the reference feature vectors are obtained, each of the reference feature vectors may be divided into a plurality of reference feature sub-vectors with a same number, to obtain a reference feature sub-vector sequence corresponding to each of the reference feature vectors. For example, a reference feature vector is [123456789] , when divided into three reference feature sub-vectors, the three reference feature sub-vectors may be [123] , [456] and [789] respectively, so that the obtained reference feature sub-vector sequence may be [ [123] [456] [789] ] . Each of the reference feature sub-vectors in the reference feature sub-vector sequence are arranged according to a position of each of the reference feature sub-vectors in the reference feature vector.

In an embodiment, when dividing a reference feature vector, it is possible to divide the reference feature vector uniformly according to the number of divided reference feature sub-vectors; or, it is possible to divide the reference feature vector non-uniformly according to respective values in the reference feature vector, without any specific limitation set herein.

In an embodiment, the number of divided reference feature sub-vectors may be determined by a dimension of the reference feature vector. The dimension of the reference feature vector may be an integer multiple of the number of divided reference feature sub-vectors. For example, when the number of dimension of the reference feature vector is 256, the number of divided reference feature sub-vectors may be 8 or the like. The number of reference feature sub-vectors may be set in advance based on empirical values, or may be determined based on a historical division number, without any specific limitation set herein.

In an operation S1.2, the reference feature sub-vectors at the same position of each of the reference feature sub-vector sequences are taken as a sub-vector data set; a cluster processing is performed on each of the sub-vector data sets, obtaining at least one standard feature sub-vector corresponding to each of the sub-vector data sets.

After each of the reference feature vectors is divided to obtain the reference feature sub-vector sequence of each of the reference feature vectors, reference feature sub-vectors with the same position in each of the reference feature sub-vector sequence are taken as a sub-vector data set. For example, when a reference feature sub-vector sequence is [ [123] [456] [789] ] and another reference feature sub-vector sequence is [ [234] [567] [891] ] , the reference feature sub-vectors [123] and [234] at the first position are taken as a sub-vector data set, the reference feature sub-vectors [456] and [567] at the second position are taken as a sub-vector data set, and the reference feature sub-vectors [789] and [891] at the third position are taken as a sub-vector data set.

The number of obtained sub-vector data sets is identical to the number of reference feature sub-vectors. Each of the sub-vector data sets corresponds to at least one reference feature sub-vectors. A cluster processing is performed on each of the sub-vector data sets, to cluster similar reference feature sub-vectors together. After the cluster processing is performed on the sub-vector data sets, at least one cluster center corresponding to each of the sub-vector data sets may be obtained. A similarity between the reference feature sub-vectors in the same class and the cluster centers of the class lies within a preset range. Accordingly, at least one cluster center corresponding to each of the sub-vector data sets may be obtained, where one cluster center is one standard feature sub-vector.

In an embodiment, given that the more the reference feature vectors are, the greater differences among the reference feature vectors may be. Therefore, when a cluster processing is performed on the sub-vector data sets, the number of standard feature sub-vectors may increase as the reference feature vectors increase, or decrease as the reference feature vectors decrease, so as to improve an accuracy with which the standard feature sub-vectors are determined, thereby improving an accuracy of the similarity calculation. For example, for millions of reference feature vectors, the number of standard feature vectors may be set as 10,000, and for tens of millions of reference feature vectors, the number of standard feature vectors may be set as 20,000, or the like, without any specific limitation set herein.

In an operation S1.3, each of the pre-stored standard feature sub-vectors is obtained based on the at least one standard feature sub-vector corresponding to each of the sub-vector data sets.

After the at least one standard feature sub-vector corresponding to each of the sub-vector data sets is obtained, the processing device 102 may be configured to send each of the standard feature sub-vectors to the storage device 101; and the storage device 101, after receiving each of the standard feature sub-vectors sent by the processing device 102, is configured to store each of the standard feature sub-vectors to obtain each of the pre-stored standard feature sub-vectors.

In an operation S202, each of second standard feature sub-vector sequences whose similarity to each of the pre-stored reference feature vectors satisfies the first preset similarity condition is determined, based on each of the pre-stored standard feature sub-vectors.

The sub-vector data set is associated with positions of the reference feature sub-vectors included by the sub-vector data set in corresponding reference feature sub-vector sequence. That is, when the sub-vector data set includes reference feature sub-vectors at the first position in each of the reference feature sub-vector sequences, the sub-vector data set is associated with the first position.

In the at least one standard feature sub-vector corresponding to the sub-vector data set, the second standard feature sub-vector whose similarity to one of the reference feature sub-vectors at an associated position of the sub-vector data set satisfies a second preset similarity condition is determined. For example, in at least one standard feature sub-vector corresponding to the sub-vector data set whose associated position is the first position, the second standard feature sub-vector whose similarity to the reference feature sub-vector at the first position of the reference feature sub-vector sequence satisfies a second preset similarity condition is determined.

The second preset similarity condition may be that the similarity is within a preset range, or the similarity is greater than a preset threshold, or the like. Therefore, the number of determined second standard feature sub-vector may be plural. The second preset similarity condition may also be that the similarity is maximum, thus, the number of determined second standard feature sub-vector may be one, without any specific limitation set herein. When the number of the second standard feature sub-vector is plural, each of the second standard feature sub-vectors may be arranged based on a magnitude of a similarity.

Based on the at least one standard feature sub-vector corresponding to each of the sub-vector data sets, a second standard feature sub-vector corresponding to each of the reference feature sub-vectors in a reference feature sub-vector sequence may be determined. When each of the reference feature sub-vectors has a corresponding second reference feature sub-vector, a second standard feature sub-vector sequence including each the second standard feature sub-vector is determined, and a similarity between the second standard feature sub-vector sequence and the reference feature vector satisfies the first preset similarity condition. When the similarity between the second standard feature sub-vector sequence and the reference feature vector satisfies the first preset similarity condition, the second standard feature sub-vector sequence corresponding to the reference feature vector is obtained.

After the second standard feature sub-vector sequence corresponding to each of the reference feature vectors is obtained, the processing device 102 may be configured to send the second standard feature sub-vector sequence corresponding to each of the reference feature vectors to the storage device 101, and the storage device 101 is configured to receive and store the second standard feature sub-vector sequence corresponding to each of the reference feature vectors, as sent by the processing device 102. Accordingly, when a similarity between the target feature vector and each of the reference feature vectors is determined, the processing device 102 may be configured to directly obtain the second standard feature sub-vector sequence corresponding to each of the reference feature vectors, for calculation; in this way, there is no need for the second standard feature sub-vector sequence corresponding to each of the reference feature vectors to be calculated in real time; thereby timeliness of determining the similarity between the target feature vector and each of the reference feature vectors is improved.

In an embodiment, each of the standard feature sub-vectors may have a vector identifier that may uniquely identify each of the standard feature sub-vectors; or, each of the standard feature sub-vectors may have a vector identifier used to uniquely identify each of the standard feature sub-vectors, in its corresponding sub-vector data set. The second standard feature sub-vector sequence of the reference feature vector may be represented by a vector identifier sequence. Therefore, the processing device 102 does not need be configured to perform a corresponding processing on each of the standard feature sub-vectors by transmitting each of the standard feature sub-vectors in the processing. Only by transmitting the vector identifiers, the standard feature sub-vectors required to be processed may be determined, thereby occupation of resource processing may be reduced.

In an operation S203, a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies the first preset similarity condition is determined, based on each of the pre-stored standard feature sub-vectors.

When the feature vector similarity between the target feature vector and each of the reference feature vectors needs to be determined, the target feature vector may be divided first to obtain each of target feature sub-vectors corresponding to the target feature vector. The number of the target feature sub-vectors is identical to the number of reference feature sub-vectors in the reference feature vector. The process of dividing the target feature vector into each of the target feature sub-vectors to obtain a target feature sub-vector sequence is the same as the process of dividing the reference feature vector into each of the reference feature sub-vectors to obtain the reference feature sub-vector sequence, with details omitted herein.

The process of determining, in each of the pre-stored standard feature sub-vectors, a first standard feature sub-vector sequence whose similarity to the target feature vector satisfies the first preset similarity condition is the same as the process of determining, in each of the pre-stored standard feature sub-vectors, the second standard feature sub-vector sequence whose similarity to the reference feature vector satisfies the first preset similarity condition, with details omitted herein.

In an embodiment, the first standard feature sub-vector sequence of the target feature vector may be represented by a vector identifier. Therefore, the processing device 102 does not need be configured to perform a corresponding processing on each of the standard feature sub-vectors by transmitting each of the standard feature sub-vectors in the processing. Only by transmitting the vector identifiers, the standard feature sub-vectors required to be processed may be determined, thereby the occupation of resource processing may be reduced.

In an operation S204, a sequence similarity between the first standard feature sub-vector sequence and each of the second standard feature sub-vectors sequences is determined; a feature vector similarity between the target feature vector and each of the reference feature vectors is obtained.

After obtaining the first standard feature sub-vector sequence of the target feature vector, the sequence similarity between the first standard feature sub-vector sequence and each of the second standard feature sub-vectors sequences may be determined. Since both the first standard feature sub-vector sequence and each of the second standard feature sub-vectors sequences include each of the standard feature sub-vectors, a sequence similarity between the first standard feature sub-vector sequence and the second standard feature sub-vector sequence may be determined, by determining a sub-vector similarity between a first standard feature sub-vector of the first standard feature sub-vector sequence and a second standard feature sub-vectors of the second standard feature sub-vector sequence at a corresponding position.

When the sub-vector similarity between the first standard feature sub-vector and the second standard feature sub-vector at the corresponding position is determined, the sub-vector similarity between the first standard feature sub-vector and the second standard feature sub-vector may be accessed, from sub-vector similarities among each of the pre-stored standard feature sub-vectors, through the vector identifier of the first standard feature sub-vector and that of the second standard feature sub-vector at the corresponding position. When the sub-vector similarity corresponding to the vector identifier of the first standard feature sub-vectors and that of the second standard feature sub-vector is accessed, the sub-vector similarity may be accessed from the storage device 101 directly based on the vector identifiers. Alternatively, correspondence relationships between each of the vector identifiers of the first standard feature sub-vector sequence and each of the vector identifiers of each of the standard feature sub-vectors in the corresponding sub-vector data sets, are established to obtain a sub-vector similarity correspondence relationship table of the first standard feature sub-vector sequence and each of the standard feature sub-vectors in the corresponding sub-vector data sets. Accordingly, from the vector identifier lookup table, sub-vector similarities between each of the first standard feature sub-vectors and each of the second standard feature sub-vectors may be obtained. After the sub-vector similarity between each of the first standard feature sub-vectors and corresponding each of the second standard feature sub-vectors is obtained, a weighted summation processing is performed on each of the obtained sub-vector similarities, to obtain the sequence similarity between the first standard feature sub-vector sequence and the second standard feature sub-vector sequence. As compared with a single manner of determining the first standard feature sub-vector sequence and the second standard feature sub-vector sequence based on average sub-vector similarities, according to the manner of the present embodiment, degrees of importance of different sub-vector similarities may be adjusted to ensure a greater accuracy in determining the first standard feature sub-vector sequence and the second standard feature sub-vector sequence.

Referring to Table 1, which is a sub-vector similarity correspondence relationship table of the first standard feature sub-vector sequence and each of the standard feature sub-vectors in the corresponding sub-vector data sets. Therein, M represents the number of first standard feature sub-vectors in the first standard feature sub-vector sequence; K represents the number of standard feature sub-vectors in a sub-vector data set; 0～K are vector identifiers that uniquely identify each of the standard feature sub-vectors in the sub-vector data set; For example, Sim (1, 1) represents a sub-vector similarity between a standard feature sub-vector identified as 1 in the sub-vector data set and a first standard feature sub-vector identified as first standard feature sub-vector 1 in the first standard feature sub-vector sequence.

Table 1

In an embodiment, when one target feature sub-vector corresponds to a plurality of first standard feature sub-vectors, for example, the target feature sub-vector corresponds to N first standard feature sub-vectors, then a sub-vector similarity correspondence relationship table of the first standard feature sub-vector sequence and each of the standard feature sub-vectors in the sub-vector data sets can be shown as Table 2.

Table 2

When one target feature sub-vector corresponds to a plurality of first standard feature sub-vectors, then each of the first standard feature sub-vectors can have a weight. The greater the similarity between a first standard feature sub-vector and the corresponding target feature sub-vector, the greater the weight of the first standard feature sub-vector. A sub-vector similarity between each of the first standard feature sub-vectors in the plurality of first standard feature sub-vectors to which the target feature sub-vector corresponds and a second standard feature sub-vector is determined respectively. According to the weight of each of the first standard feature sub-vectors, a weighted summation processing is performed on obtained sub-vector similarities, such that sub-vector similarities between the first standard feature sub-vectors and the second standard feature sub-vectors are obtained. After the sub-vector similarity between each of the first standard feature sub-vectors and each of the corresponding second standard feature sub-vectors is obtained, a weighted summation processing is performed on each of the obtained sub-vector similarities, to obtain the sequence similarity between the first standard feature sub-vector sequence and the second standard feature sub-vector sequence.

After the sequence similarity between the first standard feature sub-vector sequence and the second standard feature sub-vector sequence is obtained, the sequence similarity between the first standard feature sub-vector sequence and the second standard feature sub-vector sequence is taken as a feature vector similarity between the target feature vector and the reference feature vector, thereby a feature vector similarity between the target feature vector and each of the reference feature vectors may be obtained.

In an embodiment, after the feature vector similarity between the target feature vector and each of the reference feature vectors is obtained, the reference feature vectors may be sorted according to magnitudes of feature vector similarities. Based on a preset ranking, reference feature vectors arranged before the preset ranking are output from each of the reference feature vectors. For example, in the field of security, several ID photos that are more similar to a captured image can be found among each of the pre-stored ID photos, such that the identity of a suspect may be efficiently determined.

In an embodiment, the similarity calculation according to the embodiment of the present disclosure can be obtained by calculating the Euclidean distance, or Mahalanobis distance, or cosine similarity between two vectors. The specific method of calculating the similarity is not limited herein.

The data processing method according to an embodiment of the present disclosure is illustrated in the following. Reference is made to FIG. 3. FIG. 3 is a principle schematic view I of the data processing method according to an embodiment of the present disclosure.

In an operation S301, the processing device 102 is configured to acquire each of reference feature vectors.

In an operation S302, the processing device 102 is configured to perform a normalization and a dimension-reduction processing on each of the reference feature vectors based on each of the reference feature vectors, so as to obtain each of processed reference feature vectors.

In an operation S303, the processing device 102 is configured to determine each of standard feature sub-vectors based on each of the processed reference feature vectors.

In an operation S304, the processing device 102 is configured to store each of the standard feature sub-vectors in the storage device 101.

In an operation S305, the processing device 102 is configured to determine a similarity between each two of the standard feature sub-vectors.

In an operation S306, the processing device 102 is configured to store similarities among each of the standard feature sub-vectors in the storage device 101.

In an operation S307, the processing device 102 is configured to determine a second standard feature sub-vector sequence corresponding to each of the reference feature vectors, based on each of the standard feature sub-vectors.

In an operation S308, the processing device 102 is configured to store each of the second standard feature sub-vectors sequences in the storage device 101.

In an operation S309, when it is needed to determine, in each of the reference feature vectors, reference feature vectors similar to a target feature vector, the processing device 102 is configured to acquire the target feature vector.

In an operation S310, the processing device 102 is configured to determine a first standard feature sub-vector sequence corresponding to the target feature vector, based on each of the standard feature sub-vectors.

In an operation S311, the processing device 102 is configured to read, from the storage device 101, the second standard feature sub-vector sequence corresponding to each of the reference feature vectors, and the similarities among each of the standard feature sub-vectors.

In an operation S312, the processing device 102 is configured to determine a sequence similarity between the first standard feature sub-vector sequence and the second standard feature sub-vector sequence. Reference is made to FIG. 4. FIG. 4 is a principle schematic view II of the data processing method according to an embodiment of the present disclosure. The reference feature vector is divided into three reference feature sub-vectors, and correspondingly, the target feature vector is divided into three target feature sub-vectors. Each of the reference feature sub-vectors corresponds to two second standard feature sub-vectors, and correspondingly, each of the target feature sub-vectors corresponds to two first standard feature sub-vectors. A sub-vector similarity between each of the first standard feature sub-vectors and corresponding second standard feature sub-vector is determined, and a weighted summation is performed on each of sub-vector similarities, to obtain the sequence similarity between the first standard feature sub-vector sequence and the second standard feature sub-vector sequence, thereby the feature vector similarity between the target feature vector and the reference feature vector is obtained.

The weighted summation on each of the sub-vector similarities can be referred to the following Equation (1) :

Therein, probe represents a target feature vector; gallery1 represents one reference feature vector among each of the reference feature vectors; Sim (probe, gallery1) represents a feature vector similarity between the target feature vector and the reference feature vector; i represents the ith target feature sub-vector, and i is an integer value greater than zero and less than or equal to the number M of target feature sub-vectors; top _probej represents the jth first standard feature sub-vector corresponding to a target feature sub-vector; top _gallery1j represents the jth second standard feature sub-vector corresponding to a reference feature sub-vector, where j is an integer value greater than 0 and less than or equal to the number k of first standard feature sub-vectors corresponding to the target feature sub-vector, or j is an integer value greater than 0 and less than or equal to the number of second standard feature sub-vectors corresponding to the reference feature sub-vector; wj represents a weight, and satisfies

The weight wj may be updated according to actual circumstances. when the reference feature sub-vectors are distributed densely or there are many cluster centers, a difference of the weight wj may appropriately be reduced. When the reference feature sub-vectors are distributed sparsely or there are few cluster centers, the difference of the weight wj may appropriately be increased.

On the basis of the same inventive concept, an embodiment according to the present disclosure provides a data processing apparatus. The apparatus, which is equivalent to the afore-mentioned processing device 102, is configured to implement the corresponding functions of the afore-mentioned data processing method. With reference to FIG. 5, the apparatus includes a first processing module 501 and a second processing module 502.

The first processing module 501 is configured to determine a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition, and a second standard feature sub-vector sequence whose similarity to a reference feature vector satisfies the first preset similarity condition, based on each of the pre-stored standard feature sub-vectors; the first standard feature sub-vector sequence includes at least one first standard feature sub-vector in each of the standard feature sub-vectors; the reference feature vector is a reference feature vector in each of pre-stored reference feature vectors, and the second standard feature sub-vector sequence includes at least one second standard feature sub-vector in each of the standard feature sub-vectors.

The second processing module 502 is configured to determine a sequence similarity between the first standard feature sub-vector sequence and each of the second standard feature sub-vectors sequences, respectively, and obtain a feature vector similarity between the target feature vector and each of the reference feature vectors.

In a possible embodiment, the first processing module 501 is further configured to:

before the determining a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition based on each of pre-stored standard feature sub-vectors, divide each of the reference feature vectors into a plurality of reference feature sub-vectors with the same number, respectively, and obtain a reference feature sub-vector sequence corresponding to each of the reference feature vectors, where each of the reference feature sub-vectors in the reference feature sub-vector sequence is arranged according to a position of each of the reference feature sub-vectors in a corresponding reference feature vector; and determine at least one corresponding standard feature sub-vector based on the reference feature sub-vectors at the same position of each of the reference feature sub-vector sequences and obtain each of the pre-stored standard feature sub-vectors.

In a possible embodiment, the first processing module 501 is specifically configured to:

take the reference feature sub-vectors at the same position of each of the reference feature sub-vector sequences as a sub-vector data set, and perform a cluster processing on each of sub-vector data sets to obtain at least one standard feature sub-vector corresponding to each of the sub-vector data set; and

obtain each of the pre-stored standard feature sub-vectors, based on at least one standard feature sub-vector corresponding to each sub-vector data set.

In a possible embodiment, when the sub-vector data set is associated with positions of the reference feature sub-vectors included by the sub-vector data set in corresponding reference feature vectors, the first processing module 501 is specifically configured to:

divide the target feature vector into a plurality of target feature sub-vectors to obtain a target feature sub-vector sequence of the target feature vector, where each of the target feature sub-vectors in the target feature sub-vector sequence is arranged according to a position of each of the target feature sub-vectors in the target feature vector;

determine the first standard feature sub-vector whose similarity to one of the target feature sub-vector satisfies a second preset similarity condition, in the at least one standard feature sub-vector corresponding to the sub-vector data set, where a position of the target feature sub-vector in the target feature sub-vector sequence is identical to an associated position of the sub-vector data set; and

when each of the target feature sub-vectors is provided with one corresponding first standard feature sub-vector, determine that a similarity between the first standard feature sub-vector sequence including each of the first standard feature sub-vectors and the target feature vector satisfies the first preset similarity condition, obtaining the first standard feature sub-vector sequence corresponding to the target feature vector.

In a possible embodiment, when each of the pre-stored standard feature sub-vectors has a vector identifier configured to uniquely represent each of standard feature sub-vectors, the first processing module 501 is specifically configured to:

obtain the first standard feature sub-vector sequence corresponding to the target feature vector based on the vector identifier of each of the first standard feature sub-vectors.

determine the second standard feature sub-vector whose similarity to one of the reference feature sub-vectors satisfies the second preset similarity condition, in the at least one standard feature sub-vector corresponding to the sub-vector data set, where a position of the reference feature sub-vector in the reference feature sub-vector sequence is identical to an associated position of the sub-vector data set; and

when each of the reference feature sub-vectors is provided with one corresponding second standard feature sub-vector, determine that a similarity between the second standard feature sub-vector sequence including each of the second standard feature sub-vectors and the reference feature vector satisfies the first preset similarity condition, obtaining the second standard feature sub-vector sequence corresponding to the reference feature vector.

In a possible embodiment, when the number of first standard feature sub-vectors in a first standard feature sub-vector sequence is same with the number of second standard feature sub-vectors in a second standard feature sub-vector sequence, the second processing module 502 is specifically configured to:

respectively determine a sub-vector similarity between each of the first standard feature sub-vectors in the first standard feature sub-vector sequence and a second standard feature sub-vector at a corresponding position of the second standard feature sub-vector sequence; and

perform a weighted summation processing on obtained sub-vector similarities, obtaining the sequence similarity between the first standard feature sub-vector sequence and the second standard feature sub-vector sequence.

In a possible embodiment, the second processing module 502 is further configured to:

perform a sort processing on each of the reference feature vectors based on the feature vector similarity from large to small, after obtaining the feature vector similarity between the target feature vector and each of the reference feature vectors; and

Based on the same inventive concept, an embodiment according to the present disclosure provides a computer device. The computer device is configured to implement functions of the aforementioned data processing apparatus. The computer device may be equivalent to the aforementioned data processing device 102. With reference to FIG. 6, the computer device comprises: at least one processor 601 and a memory 602 connected to the at least one processor 601. The specific connection medium between the processor 601 and the memory 602 is not limited in the embodiment of the present disclosure. FIG. 6 shows an example in which the processor 601 and the memory 602 are connected via a bus 600. The bus 600 is represented by a bold line in FIG. 6, and connection manners between other components are only for schematic illustration and are not limited thereby. The bus 600 can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one bold line is used in FIG. 6 to represent the bus 600, but it does not mean that there is only one bus or one type of bus. Alternatively, the processor 601 may also be referred to as the controller 601, and there is no restriction on the name.

In the embodiment of the present disclosure, the memory 602 stores instructions that can be executed by at least one processor 601, and the at least one processor 601 can execute the data processing method discussed above by executing the instructions stored in the memory 602. The processor 601 can implement the functions of each of the modules in the apparatus shown in FIG. 5.

The processor 601, as a control center of the apparatus, can use various interfaces and lines to connect various parts of the entire apparatus. By running or executing the instructions stored within the memory 602 and calling up data stored within the memory 602, various functions and processing data of the apparatus, the device as a whole may be monitored.

In a possible embodiment, the processor 601 may include one or more processing units, and the processor 601 may integrate a disclosure processor and a modem processor. Therein, the disclosure processor mainly processes the operating system, user interface, disclosure programs, and the like; and the modem processor mainly deals with wireless communication. It could be understood that, the foregoing modem processor is allowed not to be integrated into the processor 601. In some embodiments, the processor 601 and the memory 602 may be implemented on a same chip, and in some embodiments, they may also be implemented on separate chips, respectively.

The processor 601, which may be a general-purpose processor, such as a central processing unit (CPU) , a digital signal processor, a disclosure specific integrated circuit, a field programmable gate array or other programmable logic means, discrete gate or transistor logic means, discrete hardware component, may implement or execute the various methods, operations, and logic block diagrams disclosed in the embodiments of the present disclosure. The general-purpose processor may be a microprocessor, any conventional processor or the like. The operations of the data processing method disclosed in the embodiments of the present disclosure may be directly embodied as being executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.

The memory 602, as a non-volatile computer-readable non-transitory storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs and modules. The memory 602 may include at least one type of non-transitory storage medium, for example, may include flash memory, hard disk, multimedia card, card-type memory, random access memory (RAM) , static random access memory (SRAM) , Programmable Read Only Memory (PROM) , Read Only Memory (ROM) , Electrically Erasable Programmable Read-Only Memory (EEPROM) , magnetic memory, magnetic disk, optical disk, etc. The memory 602 is any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and that can be accessed by a computer, which is not limited thereto, though. The memory 602 in the embodiment of the present disclosure may also be a circuit or any other means capable of realizing a storage function, i.e., storing program instructions and/or data.

By designing and programming the processor 601, it is possible to solidify a code corresponding to the data processing method introduced in the foregoing embodiment into the chip, so that the chip, when running, can execute the operations of the data processing method of the embodiment shown in FIG. 2. How to design and program the processor 601 is a technology well known to those skilled in the art, with details omitted herein.

Based on the same inventive concept, an embodiment according to the present disclosure also provides a non-transitory storage medium that stores thereon computer instructions, which, when running on a computer, cause a computer to implement the method for identifying an abnormal behavior discussed above.

In some possible implementation manners, various aspects of the data processing method provided in the present disclosure may also be implemented in the form of a program product, which includes program codes. When the program product runs on a device, the program codes are used to cause the device to execute the operations in the data processing method according to various exemplary embodiments of the present disclosure as described above in the present specification.

Those skilled in the art should understand that, the embodiments of the present disclosure can be provided as a method, a system, or a computer program product. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to magnetic disk storage, CD-ROM, optical storage, etc. ) containing computer-usable program codes.

The present disclosure is described with reference to flowcharts and/or block diagrams of the method, the device (system) , and computer program product according to the present disclosure. It should be understood that, each process and/or block in the flowcharts and/or block diagrams, and the combination of processes and/or blocks in the flowcharts and/or block diagrams can be realized by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, a dedicated computer, an embedded processor, or other programmable data processing device to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing device are configured to generate a device that realizes the functions specified in one process or multiple processes in the flowcharts and/or one block or multiple blocks in the block diagrams.

The computer program instructions can also be loaded onto a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including an instruction device. The instruction device implements the functions specified in one process or multiple processes in the flowcharts and/or one block or multiple blocks in the block diagrams.

These computer program instructions may also be loaded onto a computer or other programmable data processing device, so that a series of operation an operations are executed on the computer or other programmable device to produce computer-implemented processing, so that the instructions executed on the computer or other programmable device provide an operations for implementing functions specified in a flow or multiple flows in the flowcharts and/or a block or multiple blocks in the block diagrams.

Obviously, those skilled in the art may make various variations and modifications to the present disclosure, without departing from the spirit and scope of the present disclosure. In this way, if these modifications and variations of the present disclosure fall within the scope of the claims of the present disclosure and their equivalent technologies, the present disclosure is also intended to include these modifications and variations.

Claims

A data processing method, comprising:

determining a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition, based on each of pre-stored standard feature sub-vectors;

determining a second standard feature sub-vector sequence whose similarity to a reference feature vector satisfies the first preset similarity condition, based on each of the pre-stored standard feature sub-vectors; wherein the first standard feature sub-vector sequence comprises at least one first standard feature sub-vector in each of the standard feature sub-vectors; the reference feature vector is a reference feature vector in each of pre-stored reference feature vectors, and the second standard feature sub-vector sequence comprises at least one second standard feature sub-vector in each of the standard feature sub-vectors;

determining a sequence similarity between the first standard feature sub-vector sequence and each of the second standard feature sub-vectors sequences; and

obtaining a feature vector similarity between the target feature vector and each of the reference feature vectors.
The method according to claim 1, before the determining a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition, based on each of pre-stored standard feature sub-vectors, further comprising:

dividing each of the reference feature vectors into a plurality of reference feature sub-vectors with the same number;

obtaining a reference feature sub-vector sequence corresponding to each of the reference feature vectors, wherein each of the reference feature sub-vectors in the reference feature sub-vector sequence is arranged according to a position of each of the reference feature sub-vectors in a corresponding reference feature vector; and

determining at least one corresponding standard feature sub-vector based on the reference feature sub-vectors at the same position of each of the reference feature sub-vector sequences, and obtaining each of the pre-stored standard feature sub-vectors.
The method according to claim 2, wherein the determining at least one corresponding standard feature sub-vector based on the reference feature sub-vectors at the same position of each of the reference feature sub-vector sequences, and obtaining each of the pre-stored standard feature sub-vectors, comprises:

taking the reference feature sub-vectors at the same position of each of the reference feature sub-vector sequences as a sub-vector data set;

performing a cluster processing on each of the sub-vector data sets, to obtain at least one standard feature sub-vector corresponding to each of the sub-vector data sets; and

obtaining each of the pre-stored standard feature sub-vectors, based on the at least one standard feature sub-vector corresponding to each of the sub-vector data sets.
The method according to claim 3, wherein when the sub-vector data set is associated with positions of the reference feature sub-vectors comprised by the sub-vector data set in corresponding reference feature vectors, the determining a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition, based on each of pre-stored standard feature sub-vectors, comprises:

dividing the target feature vector into a plurality of target feature sub-vectors to obtain a target feature sub-vector sequence of the target feature vector; wherein each of the target feature sub-vectors in the target feature sub-vector sequence is arranged according to a position of each of the target feature sub-vectors in the target feature vector;

determining the first standard feature sub-vector whose similarity to one of the target feature sub-vectors satisfies a second preset similarity condition in the at least one standard feature sub-vector corresponding to the sub-vector data set; wherein a position of the target feature sub-vector in the target feature sub-vector sequence is identical to an associated position of the sub-vector data set; and

when each of the target feature sub-vectors is provided with one corresponding first standard feature sub-vector, determining that a similarity between the first standard feature sub-vector sequence comprising each of the first standard feature sub-vectors and the target feature vector satisfies the first preset similarity condition, and obtaining the first standard feature sub-vector sequence corresponding to the target feature vector.
The method according to claim 4, wherein when each of the pre-stored standard feature sub-vectors has a vector identifier and the vector identifier is configured to uniquely represent each of the standard feature sub-vectors, the obtaining the first standard feature sub-vector sequence corresponding to the target feature vector, comprises:

obtaining the first standard feature sub-vector sequence corresponding to the target feature vector based on the vector identifier of each of the first standard feature sub-vectors.
The method according to claim 3, wherein when the sub-vector data set is associated with positions of the reference feature sub-vectors comprised by the sub-vector data set in corresponding reference feature vectors, the determining each of second standard feature sub-vector sequences whose similarity to each of the pre-stored reference feature vectors satisfies the first preset similarity condition, based on each of the pre-stored standard feature sub-vectors, comprises:

determining the second standard feature sub-vector whose similarity to one of the reference feature sub-vectors satisfies a second preset similarity condition in the at least one standard feature sub-vector corresponding to the sub-vector data set; wherein a position of the reference feature sub-vector in the reference feature sub-vector sequence is identical to an associated position of the sub-vector data set; and

when each of the reference feature sub-vectors is provided with one corresponding second standard feature sub-vector, determining that a similarity between the second standard feature sub-vector sequence comprising each of the second standard feature sub-vectors and the reference feature vector satisfies the first preset similarity condition, and obtaining the second standard feature sub-vector sequence corresponding to the reference feature vector.
The method according to claim 1, wherein when the number of the first standard feature sub-vectors in the first standard feature sub-vector sequence is same with the number of the second standard feature sub-vectors in the second standard feature sub-vector sequence, the determining the sequence similarity between the first standard feature sub-vector sequence and each of second standard feature sub-vector sequences, comprises:

determining a sub-vector similarity between each of the first standard feature sub-vectors in the first standard feature sub-vector sequence and a second standard feature sub-vector at a corresponding position of the second standard feature sub-vector sequence;

performing a weighted summation processing on obtained sub-vector similarities; and

obtaining the sequence similarity between the first standard feature sub-vector sequence and the second standard feature sub-vector sequence.
The method according to claim 1, after the obtaining feature vector similarity between the target feature vector and each of the reference feature vectors, further comprising:

performing a sort processing on each of the reference feature vectors based on the feature vector similarity from large to small; and

outputting, from each of the reference feature vectors, the reference feature vectors before a preset ranking.
A data processing apparatus, comprising:

a first processing module, configured to determine a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition, based on each of pre-stored standard feature sub-vectors; determine a second standard feature sub-vector sequence whose similarity to a pre-stored reference feature vector satisfies the first preset similarity condition, based on each of the pre-stored standard feature sub-vectors; wherein the first standard feature sub-vector sequence comprises at least one first standard feature sub-vector in each of the standard feature sub-vectors; the second standard feature sub-vector sequence comprises at least one second standard feature sub-vector in each of the standard feature sub-vectors; and

a second processing module, configured to determine a sequence similarity between the first standard feature sub-vector sequence and each of second standard feature sub-vector sequences, and obtain a feature vector similarity between the target feature vector and each of the reference feature vectors.
A computer device, comprising:

a memory, configured to store program instructions; and

a processor, configured to call up the program instructions stored in the memory and implement:

determining a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition, based on each of pre-stored standard feature sub-vectors;

determining a second standard feature sub-vector sequence whose similarity to a reference feature vector satisfies the first preset similarity condition, based on each of the pre-stored standard feature sub-vectors; wherein the first standard feature sub-vector sequence comprises at least one first standard feature sub-vector in each of the standard feature sub-vectors; the reference feature vector is a reference feature vector in each of pre-stored reference feature vectors, and the second standard feature sub-vector sequence comprises at least one second standard feature sub-vector in each of the standard feature sub-vectors;

determining a sequence similarity between the first standard feature sub-vector sequence and each of the second standard feature sub-vectors sequences; and

obtaining a feature vector similarity between the target feature vector and each of the reference feature vectors.
The computer device according to claim 10, wherein before determining a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition, based on each of pre-stored standard feature sub-vectors, the processor is configured to implement:

dividing each of the reference feature vectors into a plurality of reference feature sub-vectors with the same number;

obtaining a reference feature sub-vector sequence corresponding to each of the reference feature vectors, wherein each of the reference feature sub-vectors in the reference feature sub-vector sequence is arranged according to a position of each of the reference feature sub-vectors in a corresponding reference feature vector; and

determining at least one corresponding standard feature sub-vector based on the reference feature sub-vectors at the same position of each of the reference feature sub-vector sequences, and obtaining each of the pre-stored standard feature sub-vectors.
The computer device according to claim 11, wherein while determining at least one corresponding standard feature sub-vector based on the reference feature sub-vectors at the same position of each of the reference feature sub-vector sequences, and obtaining each of the pre-stored standard feature sub-vectors, the processor is configured to implement:

taking the reference feature sub-vectors at the same position of each of the reference feature sub-vector sequences as a sub-vector data set;

performing a cluster processing on each of the sub-vector data sets, to obtain at least one standard feature sub-vector corresponding to each of the sub-vector data sets; and

obtaining each of the pre-stored standard feature sub-vectors, based on the at least one standard feature sub-vector corresponding to each of the sub-vector data sets.
The computer device according to claim 12, wherein when the sub-vector data set is associated with positions of the reference feature sub-vectors comprised by the sub-vector data set in corresponding reference feature vectors, the determining a first standard feature sub-vector sequence whose similarity to a target feature vector satisfies a first preset similarity condition, based on each of pre-stored standard feature sub-vectors, the processor is configured to implement:

dividing the target feature vector into a plurality of target feature sub-vectors to obtain a target feature sub-vector sequence of the target feature vector; wherein each of the target feature sub-vectors in the target feature sub-vector sequence is arranged according to a position of each of the target feature sub-vectors in the target feature vector; and

determining the first standard feature sub-vector whose similarity to one of the target feature sub-vectors satisfies a second preset similarity condition in the at least one standard feature sub-vector corresponding to the sub-vector data set; wherein a position of the target feature sub-vector in the target feature sub-vector sequence is identical to an associated position of the sub-vector data set; and

when each of the target feature sub-vectors is provided with one corresponding first standard feature sub-vector, determining that a similarity between the first standard feature sub-vector sequence comprising each of the first standard feature sub-vectors and the target feature vector satisfies the first preset similarity condition, and obtaining the first standard feature sub-vector sequence corresponding to the target feature vector.
The computer device according to claim 13, wherein when each of the pre-stored standard feature sub-vectors has a vector identifier and the vector identifier is configured to uniquely represent each of the standard feature sub-vectors, the obtaining the first standard feature sub-vector sequence corresponding to the target feature vector, the processor is configured to implement:

obtaining the first standard feature sub-vector sequence corresponding to the target feature vector based on the vector identifier of each of the first standard feature sub-vectors.
A non-transitory storage medium, configured to store therein computer-executable instructions, the computer-executable instructions are configured to cause a computer to implement the method according to any one of claims 1 to 8.