CN113420733A

CN113420733A - Efficient distributed big data acquisition implementation method and system

Info

Publication number: CN113420733A
Application number: CN202110965044.8A
Authority: CN
Inventors: 杨昕
Original assignee: Beijing Heima Qifu Technology Co ltd
Current assignee: Beijing Heima Qifu Technology Co ltd
Priority date: 2021-08-23
Filing date: 2021-08-23
Publication date: 2021-09-21
Anticipated expiration: 2041-08-23
Also published as: CN113420733B

Abstract

The invention discloses a method and a system for realizing high-efficiency distributed big data acquisition, wherein the method comprises the following steps: obtaining first video information; extracting the characteristics of the first video information to obtain character characteristics, voice characteristics and visual characteristics of the first video information; performing principal component analysis on the character features, the voice features and the visual features respectively to obtain a first principal component feature set; obtaining a first video set through big data, wherein the first video set comprises a video set matched with the first principal component feature set; and acquiring principal component feature data of the first video set according to the first principal component feature set. The method solves the technical problems that in the prior art, distributed high-efficiency acquisition of mass data cannot be carried out, and the accuracy of acquired target data is low.

Description

Efficient distributed big data acquisition implementation method and system

Technical Field

The invention relates to the field of data acquisition, in particular to a method and a system for realizing high-efficiency distributed big data acquisition.

Background

Today, the internet industry is rapidly developing, data acquisition is widely applied to the internet and distributed fields, and the data acquisition field is changed significantly. The data acquisition system is a flexible and user-defined measurement system implemented in conjunction with computer-based or other specialized test platform-based measurement software and hardware products.

However, in the process of implementing the technical solution of the invention in the embodiments of the present application, the inventors of the present application find that the above-mentioned technology has at least the following technical problems:

the prior art has the technical problems that the distributed efficient acquisition of mass data cannot be carried out, and the accuracy of the acquired target data is low.

Disclosure of Invention

Aiming at the defects in the prior art, the embodiment of the application aims to solve the technical problem that the accuracy of the acquired target data is not high because the distributed efficient acquisition of mass data cannot be performed in the prior art by providing the efficient distributed big data acquisition implementation method and system. Through the comparison first principal component feature set is right the first video set is carried out the processing of removing redundant and miscellaneous to effectively extract characters, pronunciation and visual feature, and then ensure to carry out direct application to the data of gathering, reached and carried out high-efficient distributed collection to the source data, and then ensure that the target data who gathers possess accurate scientific technological effect.

On one hand, the embodiment of the application provides an implementation method for efficient distributed big data acquisition, wherein the method comprises the following steps: obtaining first video information; extracting the characteristics of the first video information to obtain character characteristics, voice characteristics and visual characteristics of the first video information; performing principal component analysis on the character features, the voice features and the visual features respectively to obtain a first principal component feature set; obtaining a first video set through big data, wherein the first video set comprises a video set matched with the first principal component feature set; and acquiring principal component feature data of the first video set according to the first principal component feature set.

On the other hand, the application also provides an efficient distributed big data acquisition implementation system, wherein the system comprises: a first obtaining unit: the first obtaining unit is used for obtaining first video information; a first extraction unit: the first extraction unit is used for extracting the characteristics of the first video information to obtain the character characteristics, the voice characteristics and the visual characteristics of the first video information; a first analysis unit: the first analysis unit is used for respectively carrying out principal component analysis on the character features, the voice features and the visual features to obtain a first principal component feature set; a second obtaining unit: the second obtaining unit is used for obtaining a first video set through big data, wherein the first video set comprises a video set matched with the first principal component feature set; a first acquisition unit: the first acquisition unit is used for acquiring principal component feature data of the first video set according to the first principal component feature set.

One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:

obtaining first video information; extracting the characteristics of the first video information to obtain character characteristics, voice characteristics and visual characteristics of the first video information; performing principal component analysis on the character features, the voice features and the visual features respectively to obtain a first principal component feature set; obtaining a first video set through big data, wherein the first video set comprises a video set matched with the first principal component feature set; and acquiring principal component feature data of the first video set according to the first principal component feature set. Through the comparison first principal component feature set is right the first video set is carried out the processing of removing redundant and miscellaneous to effectively extract characters, pronunciation and visual feature, and then ensure to carry out direct application to the data of gathering, reached and carried out high-efficient distributed collection to the source data, and then ensure that the target data who gathers possess accurate scientific technological effect.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

fig. 1 is a schematic flow chart of an implementation method for efficient distributed big data acquisition according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of an efficient distributed big data acquisition implementation system according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an exemplary electronic device according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides a method and a system for realizing high-efficiency distributed big data acquisition, and solves the technical problem that in the prior art, distributed high-efficiency acquisition of mass data cannot be carried out, so that the accuracy of acquired target data is low. Through the comparison first principal component feature set is right the first video set is carried out the processing of removing redundant and miscellaneous to effectively extract characters, pronunciation and visual feature, and then ensure to carry out direct application to the data of gathering, reached and carried out high-efficient distributed collection to the source data, and then ensure that the target data who gathers possess accurate scientific technological effect.

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are merely some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited to the example embodiments described herein.

Summary of the application

Today, the internet industry is rapidly developing, data acquisition is widely applied to the internet and distributed fields, and the data acquisition field is changed significantly. The data acquisition system is a flexible and user-defined measurement system implemented in conjunction with computer-based or other specialized test platform-based measurement software and hardware products. The prior art has the technical problems that the distributed efficient acquisition of mass data cannot be carried out, and the accuracy of the acquired target data is low.

In view of the above technical problems, the technical solution provided by the present application has the following general idea:

the embodiment of the application provides a method for realizing efficient distributed big data acquisition, wherein the method comprises the following steps: obtaining first video information; extracting the characteristics of the first video information to obtain character characteristics, voice characteristics and visual characteristics of the first video information; performing principal component analysis on the character features, the voice features and the visual features respectively to obtain a first principal component feature set; obtaining a first video set through big data, wherein the first video set comprises a video set matched with the first principal component feature set; and acquiring principal component feature data of the first video set according to the first principal component feature set.

For better understanding of the above technical solutions, the following detailed descriptions will be provided in conjunction with the drawings and the detailed description of the embodiments.

Example one

As shown in fig. 1, an embodiment of the present application provides an implementation method for efficient distributed big data acquisition, where the method includes:

step S100: obtaining first video information;

step S200: extracting the characteristics of the first video information to obtain character characteristics, voice characteristics and visual characteristics of the first video information;

particularly, in the rapid development of the internet industry, data acquisition has been widely applied to the internet and distributed fields, and the field of data acquisition has changed significantly. Data acquisition refers to automatically acquiring non-electric quantity or electric quantity signals from analog and digital tested units such as sensors and other devices to be tested, and sending the signals to an upper computer for analysis and processing. The data acquisition system is a flexible and user-defined measurement system implemented in conjunction with computer-based or other specialized test platform-based measurement software and hardware products. In the embodiment of the application, in order to perform efficient distributed acquisition on mass data, data features can be acquired in a split manner, the first video information is source data which needs to be acquired in a distributed manner, and then the first video information is subjected to feature extraction, wherein the feature extraction comprises the step of performing distributed extraction on character features, voice features and visual features, the monitoring video information of a camera is taken as an example for explanation, and when information tracking is required according to a monitoring video, the character features, the voice features and the visual features are particularly important as key analysis and comparison factors.

Step S300: performing principal component analysis on the character features, the voice features and the visual features respectively to obtain a first principal component feature set;

specifically, it is known that relevant text, voice and visual features are extracted based on the first video information, and further, principal component analysis can be performed on the first video information, so-called principal component analysis algorithm is the most commonly used linear dimension reduction method, and the objective thereof is to map high-dimensional data into low-dimensional space through some linear projection, and expect that the information amount of the data is maximum (the variance is maximum) in the projected dimension, so as to use less data dimension while retaining the characteristics of more raw data points. In short, the data is subjected to dimension reduction processing, and the first principal component feature set is a feature set obtained after the dimension reduction processing is performed on the text feature, the speech feature and the visual feature.

Step S400: obtaining a first video set through big data, wherein the first video set comprises a video set matched with the first principal component feature set;

specifically, the first principal component feature set includes a principal feature set of the first video information, and in order to perform information tracking on the principal feature set in the first video information based on big data, further, a first video set may be obtained through the big data, where the first video set includes a video set matched with the first principal component feature set, that is, in the range of the first video set, the principal feature of the first video information is subjected to information tracking, so as to ensure that efficient distributed acquisition of target data is achieved.

Step S500: and acquiring principal component feature data of the first video set according to the first principal component feature set.

Specifically, generally speaking, the acquired first video set is relatively complex and cannot be directly applied, so that principal component feature data acquisition can be performed on the first video set according to the first principal component feature set, that is, the first principal component feature set is compared, and the first video set is subjected to redundancy removal processing, so that characters, voice and visual features can be effectively extracted, and further, the acquired data can be ensured to be directly applied, and efficient distributed acquisition of target data can be realized.

Preferably, the step S300 further includes performing principal component analysis on the text feature, the speech feature and the visual feature to obtain a first principal component feature set, respectively:

step S310: performing decentralized processing on the first characteristic data set to obtain a second characteristic data set;

step S320: obtaining a first covariance matrix of the second feature data set;

step S330: calculating the first covariance matrix to obtain a first eigenvalue and a first eigenvector of the first covariance matrix;

step S340: and projecting the first feature data set to the first feature vector to obtain a first dimension reduction data set, wherein the first dimension reduction data set is the first principal component feature set.

Specifically, in order to make the data sources of the text feature, the voice feature, and the visual feature more concise and reduce the redundancy of data, a first feature data set may be obtained according to the text feature, the voice feature, and the visual feature, and then the extracted feature data may be subjected to a digitization process, and a feature data set matrix may be constructed to obtain the first feature data set. And then carrying out centralization processing on each feature data in the first feature data set, firstly solving an average value of each feature in the first feature data set, then subtracting the average value of each feature from each feature for all samples, and then obtaining a new feature value, wherein the second feature data set is formed by the new feature values, and is a data matrix. By the covariance formula:

and operating the second characteristic data set to obtain a first covariance matrix of the second characteristic data set. Wherein the content of the first and second substances,

characteristic data in the second characteristic data set;

is the average value of the characteristic data;

the total amount of sample data in the second feature data set. Then, through matrix operation, the eigenvalue and the eigenvector of the first covariance matrix are solved, and each eigenvalue corresponds to one eigenvector. And selecting the largest first K characteristic values and the characteristic vectors corresponding to the maximum first K characteristic values from the obtained first characteristic vectors, and projecting the original characteristics in the first characteristic data set onto the selected characteristic vectors to obtain the first dimension reduction data set after dimension reduction. The feature data in the database are subjected to dimensionality reduction processing through a principal component analysis method, and redundant data are removed on the premise of ensuring the information quantity, so that the sample quantity of the feature data in the database is reduced, the loss of the information quantity after dimensionality reduction is minimum, and the operation speed of a training model on the data is accelerated.

Preferably, before the acquiring the principal component feature data of the first video set according to the first principal component feature set, step S500 further includes:

step S510: performing decision tree classification on the first video set according to the character features, the voice features and the visual features to obtain a first classification result;

step S520: and acquiring principal component feature data of the first classification result according to the first principal component feature set.

Specifically, before the main component feature data of the first video set is collected, the first video set may be classified, so that the main component feature data collection is performed better, and further, the first video set may be classified based on a decision tree. Classification trees (decision trees) are a very common classification method. It is a kind of supervised learning, which is to say, given a pile of samples, each sample has a set of attributes and a class, which are determined in advance, a classifier is obtained through learning, and the classifier can give correct classification to newly appeared objects. Through a decision tree, the first video set can be clearly classified, the first classification result is divided into the character feature, the voice feature and the visual feature, and then principal component feature data collection is carried out on the first classification result according to the first principal component feature set, so that the principal component feature data collection is more convenient and faster.

Preferably, the step S510 of performing decision tree classification on the first video set according to the text feature, the voice feature and the visual feature to obtain a first classification result further includes:

step S511: performing information coding operation on the character features to obtain a first feature information entropy;

step S512: performing information theory encoding operation on the voice characteristics to obtain a second characteristic information entropy, and performing information theory encoding operation on the visual characteristics to obtain a third characteristic information entropy;

step S513: training a comparison model of the first feature information entropy, the second feature information entropy and the third feature information entropy input data to obtain first root node feature information;

step S514: constructing a decision tree of the first video set based on the first root node characteristic information and a recursive algorithm of the first video set;

step S515: and obtaining a first classification result according to the decision tree.

Preferably, the obtaining a first classification result according to the decision tree, and step S515 further includes:

step S5151: inputting the first video set into the decision tree, and obtaining the first classification result of the first video set, wherein the first classification result comprises a visual feature category, an audio feature category and a text feature category.

Specifically, in order to construct a decision tree, information entropy calculation may be performed on the text feature, the speech feature, and the visual feature, respectively, that is, by an information entropy calculation formula in information theory encoding:

,

wherein t represents a random variable, corresponding to which is a set of all possible outputs, defined as a set of symbols, the output of the random variable being represented by t,

representing the output probability function, the larger the uncertainty of the variable, the larger the entropy.

Carrying out specific calculation of information entropy values, further obtaining corresponding first feature information entropy, second feature information entropy and third feature information entropy, further carrying out comparison of magnitude values on the first feature information entropy, the second feature information entropy and the third feature information entropy based on the data magnitude comparison model, further obtaining features with the minimum entropy, namely first root feature information, carrying out priority classification on the features with the minimum entropy, then carrying out classification of recursion algorithms on the features in sequence according to the sequence of the entropy values from small to large, and finally constructing a decision tree so as to classify the first video set, thereby facilitating better main component feature data acquisition

Furthermore, a decision tree is constructed, and then the first video set is classified and learned through the prediction model, so that the video set is accurately classified. The method comprises the steps of calculating the information entropy of various features by obtaining the classification features as much as practical, selecting and preferentially classifying the features with the minimum information entropy, and performing recursive classification on other classification features according to the same method, so that the finally constructed decision tree is more accurately classified.

Preferably, the acquiring the principal component feature data of the first video set according to the first principal component feature set further includes:

step S530: and inputting a feature category set matched with the first principal component feature set in the first video set as input data into a neural network model, and using the first principal component feature set as supervision data to obtain an output result, wherein the output result comprises principal component feature data.

Specifically, in order to perform principal component feature data acquisition on the first video set according to the first principal component feature set, further, a feature class set matched with the first principal component feature set in the first video set may be used as input data, and input into a Neural network model for training, where the Neural network model is a Neural network model in machine learning, and a Neural Network (NN) is a complex Neural network system formed by widely interconnecting a large number of simple processing units (called neurons), reflects many basic features of human brain functions, and is a highly complex nonlinear dynamical learning system. Neural network models are described based on mathematical models of neurons. Artificial Neural Networks (ANN), is a description of the first-order properties of the human brain system. Briefly, it is a mathematical model. And through training of a large amount of training data, inputting a feature class set matched with the first principal component feature set in the first video set as input data into a neural network model, and outputting the principal component feature data.

More specifically, the training process is substantially a supervised learning process, each group of supervised data includes a feature class set matched with the first principal component feature set in the first video set and identification information identifying the principal component feature data, the feature class set matched with the first principal component feature set in the first video set is input into a neural network model, the neural network model performs continuous self-correction and adjustment according to the identification information for identifying the principal component feature data until an obtained output result is consistent with the identification information, the group of supervised learning is ended, and the next group of data supervised learning is performed; and when the output information of the neural network model reaches the preset accuracy rate/reaches the convergence state, finishing the supervised learning process. Through supervised learning of the neural network model, the neural network model can process the input information more accurately, and the output principal component characteristic data is more reasonable and accurate.

Compared with the prior art, the invention has the following beneficial effects:

1. obtaining first video information; extracting the characteristics of the first video information to obtain character characteristics, voice characteristics and visual characteristics of the first video information; performing principal component analysis on the character features, the voice features and the visual features respectively to obtain a first principal component feature set; obtaining a first video set through big data, wherein the first video set comprises a video set matched with the first principal component feature set; and acquiring principal component feature data of the first video set according to the first principal component feature set. Through the comparison first principal component feature set is right the first video set is carried out the processing of removing redundant and miscellaneous to effectively extract characters, pronunciation and visual feature, and then ensure to carry out direct application to the data of gathering, reached and carried out high-efficient distributed collection to the source data, and then ensure that the target data who gathers possess accurate scientific technological effect.

Example two

Based on the same inventive concept as the method for realizing the high-efficiency distributed big data acquisition in the foregoing embodiment, the present invention further provides a system for realizing the high-efficiency distributed big data acquisition, as shown in fig. 2, the system includes:

the first obtaining unit 11: the first obtaining unit 11 is configured to obtain first video information;

the first extraction unit 12: the first extraction unit 12 is configured to perform feature extraction on the first video information to obtain a text feature, a voice feature, and a visual feature of the first video information;

first analysis unit 13: the first analysis unit 13 is configured to perform principal component analysis on the text feature, the voice feature, and the visual feature, respectively, to obtain a first principal component feature set;

the second obtaining unit 14: the second obtaining unit 14 is configured to obtain a first video set through big data, where the first video set includes a video set matching the first principal component feature set;

the first acquisition unit 15: the first acquiring unit 15 is configured to perform principal component feature data acquisition on the first video set according to the first principal component feature set.

Further, the system further comprises:

a first processing unit: the first processing unit is used for performing decentralized processing on the first characteristic data set to obtain a second characteristic data set;

a third obtaining unit: the third obtaining unit is configured to obtain a first covariance matrix of the second feature data set;

a first arithmetic unit: the first operation unit is used for operating the first covariance matrix to obtain a first eigenvalue and a first eigenvector of the first covariance matrix;

a first projection unit: the first projection unit is configured to project the first feature data set to the first feature vector to obtain a first dimension reduction data set, where the first dimension reduction data set is the first principal component feature set.

Further, the system further comprises:

a first classification unit: the first classification unit is used for performing decision tree classification on the first video set according to the character features, the voice features and the visual features to obtain a first classification result;

a second acquisition unit: the second acquisition unit is used for acquiring principal component characteristic data of the first classification result according to the first principal component characteristic set.

Further, the system further comprises:

a second arithmetic unit: the second operation unit is used for carrying out information coding operation on the character characteristics to obtain a first characteristic information entropy;

a third arithmetic unit: the third operation unit is used for performing information theory encoding operation on the voice characteristics to obtain a second characteristic information entropy and performing information theory encoding operation on the visual characteristics to obtain a third characteristic information entropy;

a first input unit: the first input unit is used for training a comparison model of the first feature information entropy, the second feature information entropy and the third feature information entropy input data size to obtain first root node feature information;

a first building unit: the first construction unit is used for constructing a decision tree of the first video set based on the first root node characteristic information and a recursive algorithm of the first video set;

a fourth obtaining unit: the fourth obtaining unit is configured to obtain a first classification result according to the decision tree.

Further, the system further comprises:

a second input unit: the second input unit is configured to input the first video set into the decision tree, and obtain the first classification result of the first video set, where the first classification result includes a visual feature category, an audio feature category, and a text feature category.

Further, the system further comprises:

a third input unit: the third input unit is used for inputting a feature class set matched with the first principal component feature set in the first video set into a neural network model by using the feature class set as input data, and obtaining an output result by using the first principal component feature set as supervision data, wherein the output result comprises principal component feature data.

Various changes and specific examples of the efficient distributed big data acquisition implementation method in the first embodiment of fig. 1 are also applicable to the efficient distributed big data acquisition implementation system in the present embodiment, and through the foregoing detailed description of the efficient distributed big data acquisition implementation method, those skilled in the art can clearly know the implementation method of the efficient distributed big data acquisition implementation system in the present embodiment, so for the brevity of the description, detailed description is not repeated again.

EXAMPLE III

The electronic device of the embodiment of the present application is described below with reference to fig. 3.

Fig. 3 illustrates a schematic structural diagram of an electronic device according to an embodiment of the present application.

Based on the inventive concept of the efficient distributed big data acquisition implementation method in the foregoing embodiment, the present invention further provides an efficient distributed big data acquisition implementation system, on which a computer program is stored, and when the computer program is executed by a processor, the steps of any one of the methods of the efficient distributed big data acquisition implementation system are implemented.

Where in fig. 3 a bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 linking together various circuits including one or more processors, represented by processor 302, and memory, represented by memory 304. The bus 300 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 305 provides an interface between the bus 300 and the receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be the same element, i.e., a transceiver, providing a means for communicating with various other systems over a transmission medium. The processor 302 is responsible for managing the bus 300 and general processing, and the memory 304 may be used for storing data used by the processor 302 in performing operations.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create a system for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including an instruction system which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An implementation method for efficient distributed big data acquisition, wherein the method comprises the following steps:

obtaining first video information;

extracting the characteristics of the first video information to obtain character characteristics, voice characteristics and visual characteristics of the first video information;

performing principal component analysis on the character features, the voice features and the visual features respectively to obtain a first principal component feature set;

obtaining a first video set through big data, wherein the first video set comprises a video set matched with the first principal component feature set;

and acquiring principal component feature data of the first video set according to the first principal component feature set.

2. The method of claim 1, wherein said performing principal component analysis on said textual features, said phonetic features, and said visual features, respectively, to obtain a first set of principal component features, comprises:

performing decentralized processing on the first characteristic data set to obtain a second characteristic data set;

obtaining a first covariance matrix of the second feature data set;

calculating the first covariance matrix to obtain a first eigenvalue and a first eigenvector of the first covariance matrix;

and projecting the first feature data set to the first feature vector to obtain a first dimension reduction data set, wherein the first dimension reduction data set is the first principal component feature set.

3. The method of claim 1, wherein said performing principal component feature data acquisition on said first set of videos from said first set of principal component features comprises:

performing decision tree classification on the first video set according to the character features, the voice features and the visual features to obtain a first classification result;

and acquiring principal component feature data of the first classification result according to the first principal component feature set.

4. The method of claim 3, wherein said decision tree classifying said first set of videos according to said textual features, said phonetic features, and said visual features to obtain a first classification result comprises:

performing information coding operation on the character features to obtain a first feature information entropy;

performing information theory encoding operation on the voice characteristics to obtain a second characteristic information entropy, and performing information theory encoding operation on the visual characteristics to obtain a third characteristic information entropy;

training a comparison model of the first feature information entropy, the second feature information entropy and the third feature information entropy input data to obtain first root node feature information;

constructing a decision tree of the first video set based on the first root node characteristic information and a recursive algorithm of the first video set;

and obtaining a first classification result according to the decision tree.

5. The method of claim 4, wherein said obtaining a first classification result based on said decision tree comprises:

inputting the first video set into the decision tree, and obtaining the first classification result of the first video set, wherein the first classification result comprises a visual feature category, an audio feature category and a text feature category.

6. The method of claim 1, wherein said collecting principal component feature data from said first set of videos comprises:

and inputting a feature category set matched with the first principal component feature set in the first video set as input data into a neural network model, and using the first principal component feature set as supervision data to obtain an output result, wherein the output result comprises principal component feature data.

7. An efficient distributed big data acquisition implementation system, wherein the system comprises:

a first obtaining unit: the first obtaining unit is used for obtaining first video information;

a first extraction unit: the first extraction unit is used for extracting the characteristics of the first video information to obtain the character characteristics, the voice characteristics and the visual characteristics of the first video information;

a first analysis unit: the first analysis unit is used for respectively carrying out principal component analysis on the character features, the voice features and the visual features to obtain a first principal component feature set;

a second obtaining unit: the second obtaining unit is used for obtaining a first video set through big data, wherein the first video set comprises a video set matched with the first principal component feature set;

a first acquisition unit: the first acquisition unit is used for acquiring principal component feature data of the first video set according to the first principal component feature set.

8. An efficient distributed big data collection implementation system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any of claims 1-6 when executing the program.