CN112070178B - Method and device for determining image sequence sample set and computer equipment - Google Patents

Method and device for determining image sequence sample set and computer equipment Download PDF

Info

Publication number
CN112070178B
CN112070178B CN202010998142.7A CN202010998142A CN112070178B CN 112070178 B CN112070178 B CN 112070178B CN 202010998142 A CN202010998142 A CN 202010998142A CN 112070178 B CN112070178 B CN 112070178B
Authority
CN
China
Prior art keywords
image sequence
samples
extraction
image
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010998142.7A
Other languages
Chinese (zh)
Other versions
CN112070178A (en
Inventor
汪贤
熊宝玉
樊鸿飞
蔡媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202010998142.7A priority Critical patent/CN112070178B/en
Publication of CN112070178A publication Critical patent/CN112070178A/en
Application granted granted Critical
Publication of CN112070178B publication Critical patent/CN112070178B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a method and a device for determining an image sequence sample set and computer equipment, relates to the technical field of data processing, and solves the technical problem that the screened image sequence sample set is poor in diversity balance degree. The method comprises the following steps: acquiring feature data of a plurality of image sequence samples and extraction probability corresponding to each image sequence sample; clustering a plurality of image sequence samples according to the characteristic data to obtain a plurality of class clusters; polling and extracting class clusters according to a class cluster extraction sequence among a plurality of class clusters, extracting image sequence samples according to a sample extraction sequence among a plurality of image sequence samples in the class clusters aiming at each class cluster until the number of the extracted image sequence samples reaches a preset number, and determining an image sequence sample set based on the extracted image sequence samples; the cluster extraction sequence is the average value size sequence of the extraction probability of the clusters; the sample extraction order is the order of the extraction probability of the samples.

Description

Method and device for determining image sequence sample set and computer equipment
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for determining an image sequence sample set, and a computer device.
Background
Currently, in the field of quality evaluation of image sequence samples such as video and pictures, the construction cost of an image sequence sample set for subjective evaluation is relatively high, because the image sequence samples need to be labeled manually. For development and training of the quality evaluation algorithm model, the image sequence sample set for training needs to have diversity and diversity distribution balance as much as possible so as to reduce the condition of labeling redundant image sequence samples.
However, in the process of extracting the image sequence sample set used for determination from a large number of uncertain image sequence samples, the actual extraction probability has larger randomness, so that the screened image sequence sample set is extremely easy to have pseudo-equilibrium, namely the diversity distribution of the screened image sequence sample set is relatively unbalanced.
Disclosure of Invention
The application aims to provide a method and a device for determining an image sequence sample set and computer equipment so as to alleviate the technical problem that the screened image sequence sample set is poor in diversity balance degree.
In a first aspect, an embodiment of the present application provides a method for determining a sample set of an image sequence, where the method includes:
acquiring characteristic data of a plurality of image sequence samples and extracting probability corresponding to each image sequence sample; the extraction probability is in direct proportion to the sparseness of the characteristic data, wherein the sparseness is the sparseness of the position of the characteristic data in the image characteristic dimension space;
clustering a plurality of image sequence samples according to the characteristic data to obtain a plurality of class clusters;
polling and extracting the class clusters according to a class cluster extraction sequence among a plurality of the class clusters, extracting the image sequence samples according to a sample extraction sequence among a plurality of the image sequence samples in the class clusters aiming at each class cluster until the number of the extracted image sequence samples reaches a preset number, and determining the image sequence sample set based on the extracted image sequence samples; the cluster extraction sequence is the average value size sequence of the extraction probability of the clusters; the sample extraction order is the extraction probability size order of the samples.
In one possible implementation, the average value of the extraction probabilities of the class clusters is ordered from large to small according to the average value of the extraction probabilities corresponding to each of the class clusters; the extraction probability of the samples is ordered from big to small according to the extraction probability of a plurality of image sequence samples in each class cluster.
In one possible implementation, after the step of extracting the image sequence samples according to a sample extraction order among the plurality of image sequence samples in the class cluster, the method further includes:
the extracted image sequence samples are marked so that the extracted image sequence samples are not repeatedly extracted.
In one possible implementation, the step of acquiring feature data of a plurality of image sequence samples includes:
obtaining a plurality of video samples, and cutting the video samples to obtain a plurality of video sequences with the same duration;
and extracting image features from the plurality of video sequences to obtain feature data of a plurality of video samples.
In one possible implementation, the characteristic data includes any one or more of the following:
definition, chromaticity, contrast, brightness, spatial domain information, temporal domain information, code rate and video quality index.
In one possible implementation, the step of obtaining the extraction probability corresponding to each image sequence sample includes:
determining the nearest neighbor feature space distance of each feature data in the image feature dimension space based on the feature data of a plurality of image sequence samples;
and carrying out normalization processing on the nearest neighbor feature space distances corresponding to the plurality of feature data to obtain normalization processing results, and determining the normalization processing results corresponding to each feature data as extraction probability of the image sequence samples corresponding to the feature data.
In one possible implementation, before the step of determining, based on the feature data of the plurality of image sequence samples, a nearest neighbor feature space distance of each of the feature data in the image feature dimension space, further includes:
and performing dimension reduction processing on a plurality of image feature dimensions of the image feature dimension space to remove redundant features in the plurality of image feature dimensions.
In a second aspect, there is provided a device for determining a set of image sequence samples, comprising:
the acquisition module is used for acquiring characteristic data of a plurality of image sequence samples and extraction probability corresponding to each image sequence sample; the extraction probability is in direct proportion to the sparseness of the characteristic data, wherein the sparseness is the sparseness of the position of the characteristic data in the image characteristic dimension space;
the clustering module is used for carrying out clustering processing on a plurality of image sequence samples according to the characteristic data to obtain a plurality of class clusters;
the extraction module is used for polling and extracting the class clusters according to a class cluster extraction sequence among a plurality of the class clusters, extracting the image sequence samples according to a sample extraction sequence among a plurality of the image sequence samples in the class clusters aiming at each class cluster until the number of the extracted image sequence samples reaches a preset number, and determining the image sequence sample set based on the extracted image sequence samples; the cluster extraction sequence is the average value size sequence of the extraction probability of the clusters; the sample extraction order is the extraction probability size order of the samples.
In a third aspect, an embodiment of the present application further provides a computer device, including a memory, and a processor, where the memory stores a computer program that can be executed by the processor, and the processor executes the method according to the first aspect.
In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium storing machine-executable instructions which, when invoked and executed by a processor, cause the processor to perform the method of the first aspect described above.
The embodiment of the application has the following beneficial effects:
the method, the device and the computer equipment for determining the image sequence sample set provided by the embodiment of the application can perform clustering processing on a plurality of image sequence samples according to the characteristic data of the image sequence samples to obtain a plurality of class clusters, then poll and extract the class clusters according to the class cluster extraction sequence among the plurality of class clusters, extract the image sequence samples according to the sample extraction sequence among a plurality of image sequence samples in the class clusters for each class cluster until the number of the extracted image sequence samples reaches the preset number, determine the image sequence sample set based on the extracted images, the class cluster extraction sequence in the scheme is the extraction probability average value size sequence of the class clusters, the sample extraction sequence is the extraction probability size sequence of the samples, the extraction probability is in direct proportion to the data sparseness of the position of the feature data in the image feature dimension space, the original image sequence samples can be further subjected to balanced division according to the feature space of the feature data through clustering, and then the single image sequence sample extraction probability and clustered cluster-like polling extraction probability are combined, so that the equalization of the extracted image sequence sample set in the aspect of the feature data is considered, the equalization of the extracted image sequence sample set in the aspect of the category is considered, the extraction randomness of the image sequence sample set is reduced, the feature distribution equalization degree of the extracted image sequence sample set is improved, the diversity of the samples is stronger as the distribution of the features is more balanced, and the diversity equalization degree of the image sequence sample set is further improved.
In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a method for determining an image sequence sample set according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a feature distribution of a source dataset according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a prior art filtered data feature distribution;
FIG. 4 is a schematic diagram of a data feature distribution obtained after screening by the method according to the embodiment of the present application;
fig. 5 is a schematic structural diagram of a device for determining a sample set of an image sequence according to an embodiment of the present application;
fig. 6 shows a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "comprising" and "having" and any variations thereof, as used in the embodiments of the present application, are intended to cover non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed but may optionally include other steps or elements not listed or inherent to such process, method, article, or apparatus.
In the field of video quality evaluation, the construction cost of subjective evaluation video sample sets is relatively high, so that a more efficient and balanced video sample set needs to be constructed to reduce the labeling of redundant samples. At present, when a video sample set for subsequent subjective quality evaluation is screened, only the characteristic space distance of the sample is considered, the actual extraction probability is random, the cluster distribution of the sample in the characteristic space is not considered, and the extraction is not performed according to the maximum probability, so that the screening result has larger randomness.
Moreover, when a large number of image sequence samples are screened, pseudo-equalization occurs, i.e. the image sequence sample set mostly consists of more concentrated image sequence samples in the feature space, and sparse data in the feature space are ignored. When the image sequence samples are extracted, only the feature space distance of the image sequence samples is considered, the randomness of the actual image sequence sample extraction probability is high, so that the characteristic distribution balance degree of the extracted image sequence sample set is low, the more unbalanced the characteristic distribution is, the worse the diversity of the samples is, and the worse the diversity balance degree of the screened image sequence sample set is further caused.
Based on the above, the embodiment of the application provides a method for determining an image sequence sample set, by which the technical problem that the diversity balance degree of the screened image sequence sample set is poor can be relieved, so that the screened image sequence sample set achieves diversity balance.
Embodiments of the present application are further described below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a method for determining an image sequence sample set according to an embodiment of the present application. As shown in fig. 1, the method includes:
step S110, obtaining feature data of a plurality of image sequence samples and extraction probability corresponding to each image sequence sample.
The extraction probability is proportional to the sparseness of the feature data, wherein the sparseness is the sparseness of the position of the feature data in the image feature dimension space. The extraction probability can be calculated according to the data density of the position of the characteristic data of the image sequence sample in the image characteristic dimension space, so that the extraction probability of the image sequence sample is in direct proportion to the data sparseness of the position of the characteristic data.
It should be noted that, the image sequence sample may be a picture sample of a still image or a video sample of a moving image, and the embodiment of the present application is described taking the image sequence sample as a video sample as an example.
In this step, after obtaining the feature data of the plurality of image sequence samples, the computer device may calculate the extraction probabilities of the image sequence samples corresponding to the feature data according to the data densities of the positions of the feature data in the image feature dimension space.
And step S120, clustering the plurality of image sequence samples according to the characteristic data to obtain a plurality of class clusters.
It should be noted that, the clustering in this step may be implemented by a clustering algorithm. For example, the computer device employs a clustering algorithm to perform data clustering on a plurality of image sequence samples according to the feature data to form clusters of classes as different types of distributions.
Step S130, polling and extracting class clusters according to a class cluster extraction sequence among a plurality of class clusters, extracting image sequence samples according to a sample extraction sequence among a plurality of image sequence samples in the class clusters for each class cluster until the number of the extracted image sequence samples reaches a preset number, and determining an image sequence sample set based on the extracted image sequence samples.
The class cluster extraction sequence is the average value size sequence of the extraction probabilities of the class clusters, and the sample extraction sequence is the sample extraction probability size sequence. For example, the average value of the extraction probabilities of the class clusters may be sorted from large to small according to the average value of the extraction probabilities corresponding to each of the plurality of class clusters, and the extraction probability of the samples may be sorted from large to small according to the extraction probabilities of the plurality of image sequence samples in each class cluster.
The feature data sparseness of the image sequence samples is used as the image sequence sample screening probability, the image sequence samples are further clustered and divided according to the feature data, so that the source image sequence samples are subjected to space balance splitting, when the image sequence samples are screened finally, the feature balance is achieved, category balance coverage is met, the extraction randomness is greatly reduced, the feature distribution of the screened image sequence sample sets is more balanced, the feature distribution is more balanced, the diversity of the samples is stronger, and the diversity balance degree of the image sequence sample sets is further improved.
The above steps are described in detail below.
In some embodiments, the order of the average values of the extraction probabilities of the class clusters is that the average value of the extraction probabilities corresponding to each of the plurality of class clusters is ordered from large to small; the extraction probability of the samples is ordered from big to small according to the extraction probability of a plurality of image sequence samples in each class cluster.
The probability average value of each class cluster is calculated as the extraction probability of the class cluster by the image sequence sample extraction probability in each class cluster, and the class clusters are sorted from large to small according to the extraction probability average value of the class cluster, so that the number of extracted image sequence samples can be conveniently screened according to the requirement, and each class cluster is polled for extraction according to the sorting from large to small. When a certain class of clusters is polled, the image sequence sample extraction probability in the class of clusters is extracted from large to small, namely the image sequence sample extraction probability is sequenced in the class of clusters to be extracted. In practical application, the steps can be circulated according to the total number of the target screening until the number of the extracted target numbers is reached, and the extraction flow is stopped, so that the screening is completed.
In some embodiments, the process of clustering the image sequence samples may be implemented by a number of different clustering algorithms. As an example, the step S120 may include the steps of:
step a), clustering a plurality of image sequence samples according to the characteristic data by using a clustering algorithm of any one of the following items to obtain a plurality of class clusters: k-means algorithm, mean-Shift algorithm, K-MEDOIDS algorithm, clarans algorithm, clara algorithm, DBSCAN algorithm.
For the step a), the computer device may adopt any one of the clustering algorithms in the step, and cluster the plurality of image sequence samples according to the characteristic data thereof, so as to form clustered class clusters, and may also use the plurality of class clusters as different types of distributions. In practical application, the embodiment of the application can process the clustering process by adopting a K-means package in a sklearn tool package so as to obtain the clustering label of the image sequence sample.
By arbitrarily selecting one algorithm from a plurality of different clustering algorithms to cluster the image sequence samples, a more flexible and efficient clustering process can be realized so as to adapt to specific situations of a plurality of different image sequence samples.
In some embodiments, during the extraction of the image sequence samples, the image sequence samples may be prevented from being repeatedly extracted by marking the image sequence samples. As an example, after the process of extracting the image sequence samples in the sample extraction order among the plurality of image sequence samples in the class cluster in the above step S130, the method may further include the following steps:
and b), marking the extracted image sequence samples so that the extracted image sequence samples are not repeatedly extracted.
For step b) above, the computer device may, for example, mark the already extracted image sequence sample, marking it no longer involved in the later screening, avoiding that the already extracted image sequence sample is repeatedly extracted. By marking the extracted image sequence samples, repeated extraction of the image sequence samples can be avoided, and error conditions of repeated samples in the extracted image sequence sample set can be prevented.
In some embodiments, the feature data in embodiments of the present application may be obtained from cropped video samples. As one example, the image sequence samples are video samples; the step S110 may include the steps of:
step c), obtaining a plurality of video samples, and cutting the video samples to obtain a plurality of video sequences with the same duration;
and d), extracting image features from the plurality of video sequences to obtain feature data of a plurality of video samples.
For step c) above, a large number of raw video samples may be collected in various channels, for example, nearly 30w video samples, each cut to a segment of about 5s in duration, may be collected via the internet and an internal database.
For step d) above, the computer device may, for example, extract a series of video features from the video segments of the same duration after cropping, thereby constructing feature data for the video sample. By cutting the video samples for the same time length and extracting the image features, the obtained feature data are the features with balanced distribution in the video samples, and the feature data are prevented from being too concentrated in the video samples.
Based on the above steps c) and d), the extracted image features may comprise features of a variety of different aspects. As one example, the feature data includes any one or more of the following: definition, chromaticity, contrast, brightness, spatial domain information, temporal domain information, code rate and video quality index.
Of course, the extracted video features may not be limited to the above eight image features, and other image features in further aspects may be extracted according to image properties. Through the image features in various aspects such as definition, chromaticity, contrast and the like, the feature content of the extracted feature data can be more comprehensive and rich, so that the processes such as clustering and extraction of image sequence samples are more reasonable.
In some embodiments, the extraction probability corresponding to the image sequence sample may be determined based on the nearest neighbor feature space distance of the feature data in the image feature dimension space. As an example, the process of obtaining the corresponding extraction probability of each image sequence sample in the step S110 may include the following steps:
step e), determining the nearest neighbor feature space distance of each feature data in the image feature dimension space based on the feature data of a plurality of image sequence samples;
and f), carrying out normalization processing on the farthest adjacent feature space distances corresponding to the plurality of feature data to obtain normalization processing results, and determining the normalization processing results corresponding to each feature data as extraction probability of the image sequence samples corresponding to the feature data.
For the above step e), the computer device may first calculate the feature spatial nearest neighbor distance for the feature data of each image sequence sample using KNN nearest neighbor algorithm, and may calculate a preset number of nearest neighbor feature data for each feature data. The preset number may be determined according to a neighbor parameter relative to the total feature data number, for example, the neighbor parameter is set to 0.004 of the total feature data number, and the total feature data number is 30w, where the KNN neighbor number is 300000 x 0.004=1200. The maximum feature space distance Dis-knn among the 1200 nearest neighbors can be obtained by calculating the feature data of the 1200 nearest neighbors for each feature data.
For the step f), the computer device may perform max-min normalization processing on the maximum neighbor distance of all the feature data, that is, subtracting the minimum value from the maximum neighbor distance Dis-knn of each feature data, and dividing the minimum value by the difference between the maximum value and the minimum value, thereby obtaining a max-min normalization processing result. And then taking the normalized value between 0 and 1 as the sampling probability of the image sequence corresponding to the characteristic data. It should be noted that, as the maximum neighboring distance in the feature space is larger, the feature data of such feature distribution is more sparse, which represents that the smaller the ratio of such feature data in the total feature data is, the larger the extraction probability corresponding to the feature data is calculated through the maximum neighboring distance.
The feature data density, namely the maximum neighbor feature distance, of the image sequence samples is calculated, and the extraction probability corresponding to the image sequence samples is calculated according to the density, so that the feature space distance of the feature data of each image sequence sample is used as the screening probability of the image sequence samples, and the proportional relation between the extraction probability of the same image sequence sample and the data sparseness of the position of the feature data in the image feature dimension space is more accurately met.
Based on the above steps e) and f), the above feature space distance can be represented by a distance in a number of different ways. As one example, the feature space distance is any one of the following: euclidean distance, manhattan distance, and markov distance.
In practical applications, the density calculation of the feature space distance may be replaced by other calculation distance algorithms, and the embodiment of the application uses the euclidean distance as an example for the feature space distance. The feature space distance is calculated through the distances in a plurality of different modes, so that the calculation process of the feature space distance can be more flexible.
Based on the above steps e) and f), the several image feature dimensions in the image feature dimension space may be dimensions after dimension reduction. As an example, before step e) above, the method may further comprise the steps of:
and g), performing dimension reduction processing on a plurality of image feature dimensions of the image feature dimension space to remove redundant features in the plurality of image feature dimensions.
For the step g), exemplary, feature space dimension reduction can be performed on the extracted feature data by adopting a PCA feature dimension reduction algorithm after feature data of various dimensions are extracted on the original video sample, so as to remove feature redundancy, and facilitate a subsequent distance calculation process based on feature dimension space.
For example, the PCA algorithm interface function in the sklearn toolkit is used to perform the dimension reduction processing on the feature data of the original eight-dimensional features (such as the definition, chromaticity, contrast, brightness, spatial information, temporal information, code rate and video quality index), and the main component dimension may be set to any number less than eight, for example, to five dimensions. After that, step e) and step f) can be performed, i.e. the calculation process of the feature data density and the extraction probability is performed.
Of course, other dimension reduction methods can be adopted for feature data dimension reduction processing, such as manual selection, variance screening, PCA, LDA and the like. Furthermore, the step of feature dimension reduction processing and the calculation of the feature data density in the feature space may be combined in one step. In the embodiment of the application, the screening probability of the image sequence sample can be calculated by adopting PCA feature dimension reduction and KNN neighbor algorithm.
As shown in fig. 2, 3 and 4, each curve represents a probability distribution of a video feature, with the flatter curve representing a more uniform distribution of the feature in the sample set of image sequences. As shown in fig. 2, it can be seen that the feature distribution of the original image sequence sample in each dimension is very unbalanced and is substantially concentrated in some narrower range of values. From the distribution curves of fig. 3 and fig. 4, the curve of fig. 4 realized by the method provided by the embodiment of the application is more gentle than the curve of fig. 3 realized by the existing scheme, and can realize more balanced effect, namely, the characteristics of the screened image sequence sample set are more balanced, so that the diversity distribution of the constructed image sequence sample set is more balanced.
Fig. 5 provides a schematic structural diagram of a device for determining a sample set of an image sequence. As shown in fig. 5, the determining device 500 for an image sequence sample set includes:
an obtaining module 501, configured to obtain feature data of a plurality of image sequence samples and extraction probabilities corresponding to each of the image sequence samples; the extraction probability is in direct proportion to the sparseness of the characteristic data, wherein the sparseness is the sparseness of the position of the characteristic data in the image characteristic dimension space;
the clustering module 502 is configured to perform clustering processing on a plurality of image sequence samples according to the feature data to obtain a plurality of class clusters;
an extraction module 503, configured to poll and extract the class clusters according to a class cluster extraction order among a plurality of the class clusters, and for each of the class clusters, extract the image sequence samples according to a sample extraction order among a plurality of the image sequence samples in the class cluster until the number of the extracted image sequence samples reaches a preset number, and determine the image sequence sample set based on the extracted image sequence samples; the cluster extraction sequence is the average value size sequence of the extraction probability of the clusters; the sample extraction order is the extraction probability size order of the samples.
In some embodiments, the average value of the extraction probabilities of the class clusters is ordered from high to low in order of magnitude; the extraction probability of the samples is ordered from big to small according to the extraction probability of a plurality of image sequence samples in each class cluster.
In some embodiments, the apparatus further comprises:
and the marking module is used for marking the extracted image sequence samples after the image sequence samples are extracted according to the sample extraction sequence among a plurality of image sequence samples in the class cluster, so that the extracted image sequence samples are not repeatedly extracted.
In some embodiments, the obtaining module 501 is specifically configured to:
obtaining a plurality of video samples, and cutting the video samples to obtain a plurality of video sequences with the same duration;
and extracting image features from the plurality of video sequences to obtain feature data of a plurality of video samples.
In some embodiments, the feature data includes any one or more of the following:
definition, chromaticity, contrast, brightness, spatial domain information, temporal domain information, code rate and video quality index.
In some embodiments, the obtaining module 501 is specifically configured to:
determining the nearest neighbor feature space distance of each feature data in the image feature dimension space based on the feature data of a plurality of image sequence samples;
and carrying out normalization processing on the nearest neighbor feature space distances corresponding to the plurality of feature data to obtain normalization processing results, and determining the normalization processing results corresponding to each feature data as extraction probability of the image sequence samples corresponding to the feature data.
In some embodiments, the apparatus further comprises:
the dimension reduction module is used for carrying out dimension reduction processing on a plurality of image feature dimensions of the image feature dimension space before determining the distance of the nearest neighbor feature space of each feature data in the image feature dimension space based on the feature data of a plurality of image sequence samples so as to remove redundant features in the plurality of image feature dimensions.
The device for determining the image sequence sample set provided by the embodiment of the application has the same technical characteristics as the method for determining the image sequence sample set provided by the embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.
As shown in fig. 6, a computer device 600 provided in an embodiment of the present application includes: the image sequence sample set determining device comprises a processor 601, a memory 602 and a bus, wherein the memory 602 stores machine readable instructions executable by the processor 601, when the computer device is running, the processor 601 communicates with the memory 602 through the bus, and the processor 601 executes the machine readable instructions to perform the steps of the image sequence sample set determining method.
Specifically, the above-mentioned memory 602 and the processor 601 can be general-purpose memories and processors, and are not particularly limited herein, and the above-mentioned method of determining the image sequence sample set can be performed when the processor 601 runs a computer program stored in the memory 602.
The processor 601 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 601 or instructions in the form of software. The processor 601 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but may also be a digital signal processor (Digital Signal Processing, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 602, and the processor 601 reads information in the memory 602 and performs the steps of the above method in combination with its hardware.
Corresponding to the above method for determining a set of image sequence samples, an embodiment of the present application further provides a computer-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to execute the steps of the above method for determining a set of image sequence samples.
The determining device of the image sequence sample set provided by the embodiment of the application can be specific hardware on equipment or software or firmware installed on the equipment. The device provided by the embodiment of the present application has the same implementation principle and technical effects as those of the foregoing method embodiment, and for the sake of brevity, reference may be made to the corresponding content in the foregoing method embodiment where the device embodiment is not mentioned. It will be clear to those skilled in the art that, for convenience and brevity, the specific operation of the system, apparatus and unit described above may refer to the corresponding process in the above method embodiment, which is not described in detail herein.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
As another example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments provided in the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method for determining a sample set of image sequences according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It should be noted that: like reference numerals and letters in the following figures denote like items, and thus once an item is defined in one figure, no further definition or explanation of it is required in the following figures, and furthermore, the terms "first," "second," "third," etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit of the corresponding technical solutions. Are intended to be encompassed within the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims (10)

1. A method of determining a set of image sequence samples, the method comprising:
acquiring characteristic data of a plurality of image sequence samples and extracting probability corresponding to each image sequence sample; the extraction probability is in direct proportion to the sparseness of the characteristic data, wherein the sparseness is the sparseness of the position of the characteristic data in the image characteristic dimension space;
clustering a plurality of image sequence samples according to the characteristic data to obtain a plurality of class clusters;
polling and extracting the class clusters according to a class cluster extraction sequence among a plurality of the class clusters, extracting the image sequence samples according to a sample extraction sequence among a plurality of the image sequence samples in the class clusters aiming at each class cluster until the number of the extracted image sequence samples reaches a preset number, and determining the image sequence sample set based on the extracted image sequence samples; the cluster extraction sequence is the average value size sequence of the extraction probability of the clusters; the sample extraction order is the extraction probability size order of the samples.
2. The method of claim 1, wherein the order of the average values of the extraction probabilities of the class clusters is in order of magnitude from the average value of the extraction probabilities corresponding to each of the plurality of class clusters; the extraction probability of the samples is ordered from big to small according to the extraction probability of a plurality of image sequence samples in each class cluster.
3. The method according to claim 1, further comprising, after the step of extracting the image sequence samples in a sample extraction order among the plurality of image sequence samples in the class cluster:
the extracted image sequence samples are marked so that the extracted image sequence samples are not repeatedly extracted.
4. The method of claim 1, wherein the image sequence samples are video samples; the step of acquiring feature data of a plurality of image sequence samples comprises the following steps:
obtaining a plurality of video samples, and cutting the video samples to obtain a plurality of video sequences with the same duration;
and extracting image features from the plurality of video sequences to obtain feature data of a plurality of video samples.
5. The method of claim 4, wherein the characteristic data comprises any one or more of:
definition, chromaticity, contrast, brightness, spatial domain information, temporal domain information, code rate and video quality index.
6. The method according to any one of claims 1 to 5, wherein the step of obtaining a corresponding extraction probability for each of the image sequence samples comprises:
determining the nearest neighbor feature space distance of each feature data in the image feature dimension space based on the feature data of a plurality of image sequence samples;
and carrying out normalization processing on the nearest neighbor feature space distances corresponding to the plurality of feature data to obtain normalization processing results, and determining the normalization processing results corresponding to each feature data as extraction probability of the image sequence samples corresponding to the feature data.
7. The method of claim 6, further comprising, prior to the step of determining a furthest neighboring feature space distance of each of the feature data in the image feature dimension space based on the feature data of the plurality of image sequence samples:
and performing dimension reduction processing on a plurality of image feature dimensions of the image feature dimension space to remove redundant features in the plurality of image feature dimensions.
8. A device for determining a set of image sequence samples, comprising:
the acquisition module is used for acquiring characteristic data of a plurality of image sequence samples and extraction probability corresponding to each image sequence sample; the extraction probability is in direct proportion to the sparseness of the characteristic data, wherein the sparseness is the sparseness of the position of the characteristic data in the image characteristic dimension space;
the clustering module is used for carrying out clustering processing on a plurality of image sequence samples according to the characteristic data to obtain a plurality of class clusters;
the extraction module is used for polling and extracting the class clusters according to a class cluster extraction sequence among a plurality of the class clusters, extracting the image sequence samples according to a sample extraction sequence among a plurality of the image sequence samples in the class clusters aiming at each class cluster until the number of the extracted image sequence samples reaches a preset number, and determining the image sequence sample set based on the extracted image sequence samples; the cluster extraction sequence is the average value size sequence of the extraction probability of the clusters; the sample extraction order is the extraction probability size order of the samples.
9. A computer device comprising a memory, a processor, the memory having stored therein a computer program executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the method of any of the preceding claims 1 to 7.
10. A computer readable storage medium storing machine executable instructions which, when invoked and executed by a processor, cause the processor to perform the method of any one of claims 1 to 7.
CN202010998142.7A 2020-09-18 2020-09-18 Method and device for determining image sequence sample set and computer equipment Active CN112070178B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010998142.7A CN112070178B (en) 2020-09-18 2020-09-18 Method and device for determining image sequence sample set and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010998142.7A CN112070178B (en) 2020-09-18 2020-09-18 Method and device for determining image sequence sample set and computer equipment

Publications (2)

Publication Number Publication Date
CN112070178A CN112070178A (en) 2020-12-11
CN112070178B true CN112070178B (en) 2023-10-27

Family

ID=73681430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010998142.7A Active CN112070178B (en) 2020-09-18 2020-09-18 Method and device for determining image sequence sample set and computer equipment

Country Status (1)

Country Link
CN (1) CN112070178B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105894030A (en) * 2016-04-01 2016-08-24 河海大学 High-resolution remote sensing image scene classification method based on layered multi-characteristic fusion
CN109902703A (en) * 2018-09-03 2019-06-18 华为技术有限公司 A kind of time series method for detecting abnormality and device
WO2020119053A1 (en) * 2018-12-11 2020-06-18 平安科技(深圳)有限公司 Picture clustering method and apparatus, storage medium and terminal device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105894030A (en) * 2016-04-01 2016-08-24 河海大学 High-resolution remote sensing image scene classification method based on layered multi-characteristic fusion
CN109902703A (en) * 2018-09-03 2019-06-18 华为技术有限公司 A kind of time series method for detecting abnormality and device
WO2020119053A1 (en) * 2018-12-11 2020-06-18 平安科技(深圳)有限公司 Picture clustering method and apparatus, storage medium and terminal device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于傅里叶变换和连通图的聚类分析方法;巨瑜芳;雷小锋;戴斌;庄伟;宋丰泰;;计算机应用研究(第08期);全文 *
面向结构稳定性的分裂-合并聚类算法;雷小锋;何涛;李奎儒;谢昆青;丁世飞;;计算机科学(第11期);全文 *

Also Published As

Publication number Publication date
CN112070178A (en) 2020-12-11

Similar Documents

Publication Publication Date Title
CN106484837B (en) Method and device for detecting similar video files
CN110363202B (en) Pointer instrument value reading method and computer terminal
CN110443297B (en) Image clustering method and device and computer storage medium
CN111353549B (en) Image label verification method and device, electronic equipment and storage medium
CN110807488A (en) Anomaly detection method and device based on user peer-to-peer group
CN111291824B (en) Time series processing method, device, electronic equipment and computer readable medium
WO2021175040A1 (en) Video processing method and related device
CN115100450B (en) Intelligent traffic brand automobile big data detection method and system based on artificial intelligence
CN112906696B (en) English image region identification method and device
CN111046747A (en) Crowd counting model training method, crowd counting method, device and server
CN112070178B (en) Method and device for determining image sequence sample set and computer equipment
CN113987243A (en) Image file gathering method, image file gathering device and computer readable storage medium
KR20160142460A (en) Apparatus and method for detecting object
CN112861874B (en) Expert field denoising method and system based on multi-filter denoising result
CN112836759B (en) Machine-selected picture evaluation method and device, storage medium and electronic equipment
CN112288045B (en) Seal authenticity distinguishing method
CN112699908B (en) Method for labeling picture, electronic terminal, computer readable storage medium and equipment
CN110942081B (en) Image processing method, device, electronic equipment and readable storage medium
CN114626436A (en) User classification method and device, electronic equipment and storage medium
CN112132239A (en) Training method, device, equipment and storage medium
CN113536020A (en) Method, storage medium and computer program product for data query
CN111258788A (en) Disk failure prediction method, device and computer readable storage medium
CN112968968B (en) Internet of things equipment flow fingerprint identification method and device based on unsupervised clustering
WO2019127504A1 (en) Similarity measurement method and device, and storage device
CN111897984B (en) Picture labeling method and device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant