WO2021174691A1 - 数据处理的优化方法及装置、存储介质、计算机设备 - Google Patents

数据处理的优化方法及装置、存储介质、计算机设备 Download PDF

Info

Publication number
WO2021174691A1
WO2021174691A1 PCT/CN2020/093173 CN2020093173W WO2021174691A1 WO 2021174691 A1 WO2021174691 A1 WO 2021174691A1 CN 2020093173 W CN2020093173 W CN 2020093173W WO 2021174691 A1 WO2021174691 A1 WO 2021174691A1
Authority
WO
WIPO (PCT)
Prior art keywords
face data
matrix
data matrix
rows
preset
Prior art date
Application number
PCT/CN2020/093173
Other languages
English (en)
French (fr)
Inventor
张艳
孙太武
周超勇
刘玉宇
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021174691A1 publication Critical patent/WO2021174691A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding

Definitions

  • This application relates to the technical field of data processing, and in particular to an optimization method and device, storage medium, and computer equipment for data processing.
  • the computer As a computing device needs to process more and more data, and the form of processing data is becoming more and more complex.
  • the matrix has become the basic data unit of big data processing.
  • the face data needs to be multiplied and processed in the form of a matrix, that is Multiplying the matrix and the matrix to perform the code conversion operation processing.
  • the matrix data formed has a larger order, and it is in the pre-processing process before the recognition. This kind of simple multiplication between matrices consumes a lot of time when using code calculations, and when the matrix order is large, it will also increase the amount of calculation of the computer, occupy a lot of CPU resources, and reduce the efficiency of data processing.
  • this application provides a data processing optimization method and device, storage medium, and computer equipment.
  • the main purpose is to solve the simple multiplication between existing matrices, which consumes a lot of time when performing code operations, and when the matrix order is When it is larger, it will also increase the amount of calculation of the computer, occupy a lot of CPU resources, and reduce the efficiency of data processing.
  • a data processing optimization method including:
  • the face data matrix is reduced in order to obtain an optimized matrix of the face data matrix.
  • a data processing optimization device including:
  • the judgment module is used to intercept the face data when face recognition is to be performed according to the acquired face data, and judge whether the number of rows of the face data matrix constructed according to the face data exceeds a preset optimization threshold;
  • the first processing module is configured to, if all the rows of the face data matrix do not exceed the preset optimization threshold, call a preset extended instruction to perform integration processing on the face data matrix;
  • the second processing module is configured to perform a reduction process on the face data matrix if any number of rows of the face data matrix exceeds the preset optimization threshold to obtain an optimized matrix of the face data matrix.
  • a computer-readable storage medium is provided, and at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to perform the following operations:
  • the face data matrix is reduced in order to obtain an optimized matrix of the face data matrix.
  • a computer device including: a processor, a memory, a communication interface, and a communication bus.
  • the processor, the memory, and the communication interface complete mutual communication through the communication bus.
  • the memory is used to store at least one executable instruction, and the executable instruction causes the processor to perform the following operations:
  • the face data matrix is reduced in order to obtain an optimized matrix of the face data matrix.
  • This application provides a data processing optimization method and device, storage medium, and computer equipment. Compared with the prior art according to the commonly used matrix and matrix multiplication to perform code conversion operation processing, the embodiment of this application judges people When the number of rows of the face data matrix exceeds the preset optimization threshold, the face data matrix is reduced in order to obtain the optimized matrix. If it is determined that the number of rows of the face data matrix does not exceed the preset optimization threshold, use the preset extension command to The face data matrix is processed with expression calculations to reduce the time consumption of code iterative calculations during simple matrix multiplication, reduce CPU resources, reduce the amount of data processing, and improve data processing efficiency.
  • FIG. 1 shows a flowchart of a data processing optimization method provided by an embodiment of the present application
  • FIG. 2 shows a flowchart of another data processing optimization method provided by an embodiment of the present application
  • Fig. 3 shows a block diagram of a data processing optimization device provided by an embodiment of the present application
  • Figure 4 shows a block diagram of another data processing optimization device provided by an embodiment of the present application.
  • Fig. 5 shows a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the embodiment of the present application provides a method for optimizing data processing. As shown in FIG. 1, the method includes:
  • the face data matrix includes two matrices that need to be calculated by matrix multiplication, and in each matrix, the number of rows and columns of the matrix can be obtained by the method of searching and counting the number of rows and columns.
  • the preset optimization threshold is used to limit whether the face data matrix is optimized. In the embodiment of the present application, for the setting of the preset optimization threshold, in order to achieve the balance of the matrix optimization calculation when the matrix is calculated, the preset optimization threshold is calculated. Assuming that the optimization threshold is set to 32, for example, it is judged whether the number of rows of the face data matrix exceeds 32 rows, which is not specifically limited in the embodiment of the present application.
  • the face data matrix can be two matrices for multiplication
  • the preset extended instruction is an extended instruction based on arm neon, for example, a matrix multiplication instruction based on arm neon, which is not specifically limited in the embodiment of the present application.
  • any number of rows of the face data matrix exceeds the preset optimization threshold perform order reduction processing on the face data matrix to obtain an optimized matrix of the face data matrix.
  • the optimized matrix is obtained by reducing the order of the face data matrix.
  • the optimized matrix is returned to step 101 as the face data matrix to re-judgment until the value in the face data matrix The number of rows does not exceed the preset optimization threshold, and the preset extension instruction is called to perform integration processing on the face data matrix.
  • This application provides an optimization method for data processing. Compared with the prior art according to the commonly used matrix and matrix multiplication to perform code conversion operation processing, the embodiment of the application determines that the number of rows of the face data matrix exceeds the predetermined number. When setting the optimization threshold, the face data matrix is reduced in order to obtain the optimized matrix. If it is judged that the number of rows of the face data matrix does not exceed the preset optimization threshold, use the preset expansion command to perform the sum expression on the face data matrix Operational processing, in order to reduce the time-consuming of iterative operation of code during simple matrix multiplication, reduce the occupation of CPU resources, reduce the amount of data processing, and thus improve the efficiency of data processing.
  • the embodiment of the present application provides another data processing optimization method. As shown in FIG. 2, the method includes:
  • This step is the same as the method of step 101 shown in FIG. 1, and will not be repeated here.
  • the number of blocks in each row of the judgment optimized matrix is the number of columns, and the number of blocks in each column is the number of rows. , Whether it exceeds the preset optimization threshold.
  • step 102a This step is the same as the method of step 102a shown in FIG. 1, and will not be repeated here.
  • the step of invoking a preset expansion instruction to perform integration processing on the face data matrix may specifically be: using a preset expansion instruction to perform summation of all the block matrices included in the face data matrix
  • the recursive operation processing of the expression obtains the matrix operation result of the face data matrix, and the sum expression is an algorithm for multiplying and summing multiple block matrices.
  • the preset extension command is used to All the block matrices contained in the data matrix are processed by the recursive operation of the sum expression, and the result of the matrix operation is obtained.
  • step 202a is a parallel step with step 202b, after step 202b, the two optimized matrices that are re-used as the face data matrix respectively include multiple block matrices. Therefore, the preset expansion command is used to perform In the recursive operation processing of the sum expression, it is necessary to separately perform multiplication processing on multiple block matrices in the optimized face data matrix.
  • the face data matrix whose number of rows does not exceed the preset optimization threshold may be a matrix that has undergone multiple reduction processing, the face data matrix contains multiple block matrices, namely, block The matrix also includes sub-block matrices.
  • the first choice is to multiply the sub-block matrices belonging to the block matrix, and then the block matrix Multiplication is performed, and each block matrix is recursively processed in turn, and finally the operation result of the face data matrix is obtained.
  • the instruction calculates the product of each block in the matrix c. If there is a sub-block matrix in the block matrix, first calculate the sub-block matrix, and then recursively calculate all the block matrices one by one, and finally get the product of the face data matrix .
  • the preset extended instruction can be a multiplication instruction based on arm neon, for example: take 4 floating-point type float data from the data source address src to form a floating-point 32*4 matrix, namely float32 *4_t and return. The specific steps are to enter a float, copy and paste 4 times to form a float32*4_t and return. Write a float32*4_t to the data source address dst, which is equivalent to writing 4 floating-point data floats at a time. Respectively add the corresponding elements in src and dst, and multiply them to get the result.
  • This will not reduce the number of calculations for multiplication and addition, but It can reduce the number of times of taking the value of the b matrix to 1/4 of the original value, multiply each corresponding element of v1, and v2 to obtain V3, and then add each corresponding element of v3, and write it to v0 for the compiler to set
  • Variable addresses are aligned according to the 4-byte boundary, and the data is relatively concentrated. The cache will grab the data around it while grabbing the data, so that multiple matrices can be multiplied together.
  • the block matrix in the face data matrix in order to achieve the reduction processing.
  • the step is to block the face data matrix, and include the block matrix Before the face data matrix is used as the optimized matrix, it further includes: data filling of the odd number of rows and columns in the face data matrix.
  • the parity is judged on the number of rows and columns in the face data matrix, and then the rows and columns belonging to the odd numbers are filled.
  • the specific process includes: judging whether the columns of the first matrix and the rows of the second matrix in the face data matrix are even numbers, and if they are even numbers, judging whether the columns of the second matrix are even numbers.
  • the filling process can specifically amplify the odd-numbered rows or columns directly to even-numbered rows or columns according to a preset value. In the embodiment of the present application, one row or a column is amplified, wherein the filled value can be a preset value. Given any non-zero natural number, the embodiment of the present application does not specifically limit it.
  • step 202b is specifically: block the face data matrix according to the unit of the first order, and use the determined partial block matrix as the The optimized matrix of the face data matrix.
  • the segmentation can be performed according to the unit of the first order, and the first order can be the number of rows and the number of columns. Divide by 2 to determine the order.
  • the face data matrix is a matrix of 100 rows ⁇ 100 columns, and the number of rows and columns is divided by 2 to be 50, then 2 is the block order, and the optimized matrix is 50 rows. ⁇ 50 column matrix.
  • step 202b in order to avoid the multiplication of the block to two 1 ⁇ 1 matrices and affect the calculation speed, after step 202b, it may further include: judging whether the number of rows and the number of columns of the block matrix is less than the preset score. Block threshold; if it is less than the preset block threshold, block the face data matrix in units of the second order, and use the determined partial block matrix as the face data matrix Optimize the matrix.
  • the face data matrix is divided into blocks according to the second receiving unit.
  • the second order can be determined by dividing the number of rows and columns by 2 i After the face data matrix is divided into blocks according to this order, all the determined block matrices are used as the optimized matrix of the face data matrix.
  • the steps after step 202b determine the optimized matrix as the face data matrix, and re-execute the judgment on the face data constructed based on the face data Whether the number of rows of the matrix exceeds the preset optimization threshold.
  • This application provides another data processing optimization method. Compared with the prior art according to the commonly used matrix and matrix multiplication for code conversion operation processing, the embodiment of this application determines that the number of rows of the face data matrix exceeds When the optimization threshold is preset, the face data matrix is reduced in order to obtain the optimization matrix. If it is determined that the number of rows of the face data matrix does not exceed the preset optimization threshold, the face data matrix is expressed and expressed by the preset expansion instructions Formula operation processing, in order to reduce the time-consuming of code iteration operation when simple matrix multiplication, reduce the occupation of CPU resources, reduce the amount of data processing, and thus improve the efficiency of data processing.
  • an embodiment of the present application provides an optimization device for data processing.
  • the device includes: a judgment module 31, a first processing module 32, and a second processing module.
  • Processing module 33 As shown in FIG. 3, the device includes: a judgment module 31, a first processing module 32, and a second processing module. Processing module 33.
  • the judging module 31 is used for intercepting face data when face recognition is to be performed according to the acquired face data, and judging whether the number of rows of the face data matrix constructed according to the face data exceeds a preset optimization threshold;
  • the first processing module 32 is configured to, if all the rows of the face data matrix do not exceed the preset optimization threshold, call a preset extended instruction to perform integration processing on the face data matrix;
  • the second processing module 33 is configured to perform an order reduction process on the face data matrix if any number of rows of the face data matrix exceeds the preset optimization threshold to obtain an optimized matrix of the face data matrix.
  • the present application provides a data processing optimization device. Compared with the prior art according to the commonly used matrix and matrix multiplication for code conversion operation processing, the embodiment of the present application determines that the number of rows of the face data matrix exceeds the predetermined number. When setting the optimization threshold, the face data matrix is reduced in order to obtain the optimized matrix. If it is judged that the number of rows of the face data matrix does not exceed the preset optimization threshold, use the preset expansion command to perform the sum expression on the face data matrix Operational processing, in order to reduce the time-consuming of iterative operation of code during simple matrix multiplication, reduce the occupation of CPU resources, reduce the amount of data processing, and thus improve the efficiency of data processing.
  • an embodiment of the present application provides another data processing optimization device.
  • the device includes: a judgment module 41, a first processing module 42, and a first processing module.
  • the judging module 41 is configured to intercept the face data when face recognition is to be performed according to the acquired face data, and judge whether the number of rows of the face data matrix constructed according to the face data exceeds a preset optimization threshold;
  • the first processing module 42 is configured to, if all the rows of the face data matrix do not exceed the preset optimization threshold, call a preset extended instruction to perform integration processing on the face data matrix;
  • the second processing module 43 is configured to, if any number of rows of the face data matrix exceeds the preset optimization threshold, reduce the order of the face data matrix to obtain an optimized matrix of the face data matrix.
  • the device further includes:
  • the determining module 44 is configured to determine the optimization matrix as a face data matrix, and re-execute the step of judging whether the number of rows of the face data matrix constructed according to the face data exceeds a preset optimization threshold.
  • the second processing module 43 is specifically configured to block the face data matrix, and use the face data matrix including the block matrix as an optimized matrix.
  • the device further includes:
  • the filling module 45 is used for data filling for the odd number of rows and columns in the face data matrix.
  • the second processing module 43 is specifically configured to block the face data matrix according to the unit of the first order, and use all the determined block matrices as the part of the face data matrix Optimize the matrix.
  • the first device further includes:
  • the judging module 46 is configured to judge whether the number of rows and the number of columns of the block matrix is less than a preset block threshold
  • the blocking module 47 is configured to block the face data matrix according to the second order if it is less than the preset blocking threshold, and use the determined partial block matrix as the person The optimized matrix of the face data matrix.
  • the first processing module 42 is specifically configured to perform recursive operation processing of sum expressions on all partial block matrices contained in the face data matrix by using preset extended instructions to obtain the face data matrix As a result of matrix operation, the sum expression is an algorithm for multiplying and summing multiple block matrices.
  • This application provides another data processing optimization device. Compared with the prior art according to the commonly used matrix and matrix multiplication to perform code conversion operation processing, the embodiment of the application determines that the number of rows of the face data matrix exceeds When the optimization threshold is preset, the face data matrix is reduced in order to obtain the optimization matrix. If it is determined that the number of rows of the face data matrix does not exceed the preset optimization threshold, the face data matrix is expressed and expressed by the preset expansion instructions Formula operation processing, in order to reduce the time-consuming of code iteration operation when simple matrix multiplication, reduce the occupation of CPU resources, reduce the amount of data processing, and thus improve the efficiency of data processing.
  • a computer-readable storage medium wherein the computer-readable storage medium may be non-volatile or volatile, and the computer-readable storage medium stores at least one Executing instructions, the executable instructions causing the processor to perform the following operations:
  • the face data matrix is reduced in order to obtain an optimized matrix of the face data matrix.
  • FIG. 5 shows a schematic structural diagram of a computer device according to an embodiment of the present application, and the specific embodiment of the present application does not limit the specific implementation of the computer device.
  • the computer device may include: a processor (processor) 502, a communication interface (Communications Interface) 504, a memory (memory) 506, and a communication bus 508.
  • processor processor
  • communication interface Communication Interface
  • memory memory
  • the processor 502, the communication interface 504, and the memory 506 communicate with each other through the communication bus 508.
  • the communication interface 504 is used to communicate with other devices, such as network elements such as clients or other servers.
  • the processor 502 is configured to execute the program 510, and specifically can execute the relevant steps in the embodiment of the above-mentioned data processing optimization method.
  • the program 510 may include program code, and the program code includes a computer operation instruction.
  • the processor 502 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application.
  • the one or more processors included in the computer device may be the same type of processor, such as one or more CPUs, or different types of processors, such as one or more CPUs and one or more ASICs.
  • the memory 506 is used to store the program 510.
  • the memory 506 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), for example, at least one disk memory.
  • the program 510 may be specifically used to cause the processor 502 to perform the following operations:
  • the face data matrix is reduced in order to obtain an optimized matrix of the face data matrix.
  • modules or steps of this application can be implemented by a general computing device, and they can be concentrated on a single computing device or distributed in a network composed of multiple computing devices.
  • they can be implemented with program codes executable by a computing device, so that they can be stored in a storage device for execution by the computing device, and in some cases, can be executed in a different order than here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本申请公开了一种数据处理的优化方法及装置、存储介质、计算机设备,涉及数据处理技术领域,主要目的在于解决现有矩阵之间的单纯相乘在进行代码运算时耗费大量时间,且当矩阵阶数较大时,也会增大计算机的计算量,占用大量CPU资源,降低数据处理的效率的问题。包括:判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值;若所述人脸数据矩阵的全部行数未超过所述预设优化阈值,则调用预置扩展指令对所述人脸数据矩阵进行整合处理;若所述人脸数据矩阵的任意行数超过所述预设优化阈值,则对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵。

Description

数据处理的优化方法及装置、存储介质、计算机设备
相关申请的交叉引用
本申请申明享有2020年03月03日递交的申请号为202010138933.2、名称为“数据处理的优化方法及装置、存储介质、计算机设备”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本申请涉及一种数据处理技术领域,特别是涉及一种数据处理的优化方法及装置、存储介质、计算机设备。
背景技术
随着大数据处理的快速发展,计算机作为运算设备需要处理的数据越来越多,处理数据的形式也越来越复杂。其中,在过程控制、图像处理、科学计算、信号处理、密码学,计算机时序分析等领域,矩阵已经成为大数据处理的基本数据单位。
目前,现有的人脸识别过程中,获取人脸数据后需要将人脸数据作为识别模型的基础数据进行运算,在此过程中,人脸数据需要以矩阵形式进行乘法运行处理,即按照常用的矩阵与矩阵之间相乘进行代码转换运算处理,发明人发现,由于识别人脸的人脸数据是有像素数据构成,构成的矩阵数据阶数较大,在进行识别之前的预处理过程中,这种矩阵之间的单纯相乘在利用代码运算时耗费大量时间,且当矩阵阶数较大时,也会增大计算机的计算量,占用大量CPU资源,降低数据处理的效率。
发明内容
有鉴于此,本申请提供一种数据处理的优化方法及装置、存储介质、计算机设备,主要目的在于解决现有矩阵之间的单纯相乘在进行代码运算时耗费大量时间,且当矩阵阶数较大时,也会增大计算机的计算量,占用大量CPU资源,降低数据处理的效率的问题。
依据本申请一个方面,提供了一种数据处理的优化方法,包括:
当根据获取到的人脸数据待进行人脸识别时,截取人脸数据,并判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值;
若所述人脸数据矩阵的全部行数未超过所述预设优化阈值,则调用预置扩展指令对所述人脸数据矩阵进行整合处理;
若所述人脸数据矩阵的任意行数超过所述预设优化阈值,则对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵。
依据本申请另一个方面,提供了一种数据处理的优化装置,包括:
判断模块,用于当根据获取到的人脸数据待进行人脸识别时,截取人脸数据,并判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值;
第一处理模块,用于若所述人脸数据矩阵的全部行数未超过所述预设优化阈值,则调用预置扩展指令对所述人脸数据矩阵进行整合处理;
第二处理模块,用于若所述人脸数据矩阵的任意行数超过所述预设优化阈值,则对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵。
根据本申请的又一方面,提供了一种计算机可读存储介质,所述存储介质中存储有至少一可执行指令,所述可执行指令使处理器执行以下操作:
当根据获取到的人脸数据待进行人脸识别时,截取人脸数据,并判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值;
若所述人脸数据矩阵的全部行数未超过所述预设优化阈值,则调用预置扩展指令对所述人脸数据矩阵进行整合处理;
若所述人脸数据矩阵的任意行数超过所述预设优化阈值,则对所述人脸数据矩阵进行 降阶处理,得到人脸数据矩阵的优化矩阵。
根据本申请的再一方面,提供了一种计算机设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行以下操作:
当根据获取到的人脸数据待进行人脸识别时,截取人脸数据,并判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值;
若所述人脸数据矩阵的全部行数未超过所述预设优化阈值,则调用预置扩展指令对所述人脸数据矩阵进行整合处理;
若所述人脸数据矩阵的任意行数超过所述预设优化阈值,则对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵。
借由上述技术方案,本申请实施例提供的技术方案至少具有下列优点:
本申请提供了一种数据处理的优化方法及装置、存储介质、计算机设备,与现有技术按照常用的矩阵与矩阵之间相乘进行代码转换运算处理相比,本申请实施例通过判断出人脸数据矩阵的行数超过预设优化阈值时,对人脸数据矩阵进行降阶处理得到优化矩阵,若判断出人脸数据矩阵的行数未超过预设优化阈值时,利用预置扩展指令对人脸数据矩阵进行和表达式运算处理,以实现减少单纯矩阵相乘时代码迭代运算的耗时,降低占用CPU资源,减少数据处理数量,从而提高数据处理效率。
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了本申请实施例提供的一种数据处理的优化方法流程图;
图2示出了本申请实施例提供的另一种数据处理的优化方法流程图;
图3示出了本申请实施例提供的一种数据处理的优化装置组成框图;
图4示出了本申请实施例提供的另一种数据处理的优化装置组成框图;
图5示出了本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
本申请实施例提供了一种数据处理的优化方法,如图1所示,该方法包括:
101、当根据获取到的人脸数据待进行人脸识别时,截取人脸数据,并判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值。
其中,所述人脸数据矩阵包括需要进行矩阵乘法计算的2个矩阵,每个矩阵中都可以利用查找统计行数、列数的方法获取到矩阵的行数与列数。所述预设优化阈值用于限定人脸数据矩阵是否进行优化处理,本申请实施例中,对于预设优化阈值的设定,为了使矩阵在进行计算时,达到矩阵优化计算的平衡,将预设优化阈值设定为32,如,判断人脸数据矩阵的行数是否超过32行,本申请实施例不做具体限定。
102a、若所述人脸数据矩阵的全部行数未超过所述预设优化阈值,则调用预置扩展指令对所述人脸数据矩阵进行整合处理。
本申请实施例中,由于人脸数据矩阵可以为2个进行乘法运算的矩阵,因此,当判断 出2个矩阵的全部函数均未超过预设优化阈值时,说明矩阵之间的乘法运算可以以和表达式的形式,利用预置扩展指令进行运算处理。其中,所述预置扩展指令为基于arm neon的扩展指令,具体的如基于arm neon的矩阵乘法指令,本申请实施例不做具体限定。
102b、若所述人脸数据矩阵的任意行数超过所述预设优化阈值,则对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵。
对于本申请实施例,为了将行数大于预设优化阈值的人脸数据矩阵进行乘法运算时减少运算量,提高运算效率,人脸数据矩阵中的任意一个矩阵行数超过预设优化阈值,如32行时,对人脸数据矩阵进行降阶处理后的得到优化矩阵。
需要说明的是,在得到优化矩阵后,为了进一步的对矩阵进行计算,并判断是否还需要进行优化,将优化矩阵作为人脸数据矩阵返回至步骤101中重新判断,直到人脸数据矩阵中的行数均未超过预设优化阈值,调用预置扩展指令对所述人脸数据矩阵进行整合处理为止。
本申请提供了一种数据处理的优化方法,与现有技术按照常用的矩阵与矩阵之间相乘进行代码转换运算处理相比,本申请实施例通过判断出人脸数据矩阵的行数超过预设优化阈值时,对人脸数据矩阵进行降阶处理得到优化矩阵,若判断出人脸数据矩阵的行数未超过预设优化阈值时,利用预置扩展指令对人脸数据矩阵进行和表达式运算处理,以实现减少单纯矩阵相乘时代码迭代运算的耗时,降低占用CPU资源,减少数据处理数量,从而提高数据处理效率。
本申请实施例提供了另一种数据处理的优化方法,如图2所示,该方法包括:
201、当根据获取到的人脸数据待进行人脸识别时,截取人脸数据,并判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值。
本步骤与图1所示的步骤101方法相同,在此不再赘述。
需要说明的是,若需要进行判断的人脸数据矩阵为进行优化后的优化矩阵,则判断优化矩阵中每行中分块的个数为列数,每列中分块的个数为行数,是否超过预设优化阈值。
202a、若所述人脸数据矩阵的全部行数未超过所述预设优化阈值,则调用预置扩展指令对所述人脸数据矩阵进行整合处理。
本步骤与图1所示的步骤102a方法相同,在此不再赘述。
进一步地,若当判断人脸数据矩阵的行数超过预设优化阈值,对人脸数据矩阵进行了降阶处理,运算得到人脸数据矩阵的优化矩阵,将优化矩阵作为人脸数据矩阵判断出行数未超过预设优化阈值时,步骤调用预置扩展指令对所述人脸数据矩阵进行整合处理具体可以为:利用预置扩展指令对所述人脸数据矩阵中包含的全部分块矩阵进行和表达式的递归运算处理,得到所述人脸数据矩阵的矩阵运算结果,所述和表达式为将多个分块矩阵进行乘积求和的算法。
由于人脸数据矩阵的行数若未超过预设优化阈值,说明人脸数据矩阵的阶数适用于计算机中的计算指令进行计算,计算过程也不会增大耗时,占用CPU资源较小。因此,当人脸数据矩阵中的全部行数未超过预设优化阈值时,即待进行矩阵乘法计算的2个矩阵的行数均不超过预设优化阈值,则利用预置扩展指令对人脸数据矩阵中包含的全部分块矩阵进行和表达式的递归运算处理,得到矩阵运算结果。
需要说明的是,由于步骤202a为与步骤202b并列的步骤,经过步骤202b后,重新作为人脸数据矩阵的2个优化矩阵中分别包括多个分块矩阵,因此,在利用预置扩展指令进行和表达式的递归运算处理时,需要对作为优化后的一个人脸数据矩阵中的多个分块矩阵单独进行乘法运算处理。本申请实施例中,由于行数未超过预设优化阈值的人脸数据矩阵可能为经过多次进行降阶处理的矩阵,因此,人脸数据矩阵中包含有多个分块矩阵,即分块矩阵中还包括子分块矩阵,因此,在利用预置扩展指令进行和表达式的递归运算处理时,首选需要对属于分块矩阵中的子分块矩阵进行乘法运算,然后再对分块矩阵进行乘法运算,依次递归对每个分块矩阵进行运算处理,最终得到人脸数据矩阵的运行结果。
例如,人脸数据矩阵为
Figure PCTCN2020093173-appb-000001
a 11、a 12、a 21、a 22、b 11、b 12、b 21、b 22为分块矩阵,和表达式的公式a i1b 1j+a i2b 2j=c ij,利用预置扩展指令计算矩阵c中的每个分块的乘积,若分块矩阵中存在子分块矩阵,则首先计算子分块矩阵,然后逐一递归计算全部的分块矩阵,最终得到人脸数据矩阵的乘积。
对于本申请实施例,预置扩展指令可以为基于arm neon的乘法指令,具体如:从数据源地址src里取4个浮点类型float数据,组成一个浮点型32*4的矩阵,即float32*4_t并返回,具体的步骤为输入一个float,复制粘贴4次,组成一个float32*4_t并返回。把一个float32*4_t写到数据源地址dst去,相当于一次写入4个浮点数据float。分别把src中与dst对应元素相加,相乘得到结果。让src里v1元素都乘上s1,将其结果与dst里v2对应位置的元素相加,写到v0,每次算出一个4*4的块,这样不会减少乘法和加法的计算次数,但能把对b矩阵取值的次数减少到原来的1/4,让v1,和v2每一个对应元素相乘得到V3,再与v3的每一个对应元素相加,写到v0,以便编译器设置变量地址时按照4字节边界对齐,数据也比较集中,cache在抓取数据的同时会抓取其周围的数据,从而实现多个矩阵一起进行乘法计算。
202b、若所述人脸数据矩阵的任意行数超过所述预设优化阈值,则对所述人脸数据矩阵进行分块,并将包含有所述分块矩阵的人脸数据矩阵作为优化矩阵。
本申请实施例中,由于人脸数据矩阵的行数超过预设优化阈值,说明人脸数据矩阵的阶数太大,在利用现有的方法进行乘法运算时会消耗CPU大量资源,耗时过长,因此,首选确定人脸数据矩阵中的分块矩阵,以便实现降阶处理。
对于本申请实施例,为了避免确定的分块矩阵中缺少对应的行或列,无法进行完整的分块,步骤对所述人脸数据矩阵进行分块,并将包含有所述分块矩阵的人脸数据矩阵作为优化矩阵之前还包括:对所述人脸数据矩阵中属于奇数的行数、列数进行数据填充。
首先对人脸数据矩阵中的行数与列数判断奇偶性,然后对属于奇数的行、列进行填充。具体过程包括:判断人脸数据矩阵中的第一矩阵的列与第二矩阵的行是否为偶数,若是偶数,则判断第二矩阵的列是否为偶数。填充的过程具体可以将属于奇数的行或列按照预设的数值直接扩增至偶数的行数或列数,本申请实施例中为扩增一行或一列,其中,填充的数值可以为预先设定任意非零自然数,本申请实施例不做具体限定。
对于本申请实施例,为了进一步地的说明及限定,步骤202b具体为:按照第一阶数为单位对所述人脸数据矩阵的进行分块,并将确定出的全部分块矩阵作为所述人脸数据矩阵的优化矩阵。
在对人脸数据矩阵进行分块时,为了提高分块的效率及运算过程中的运算速度,可以按照第一阶数为单位进行分块,所述第一阶数可以为行数与列数除以2确定的阶数,例如,人脸数据矩阵为100行×100列的矩阵,行数、列数除以2为50,则以2为分块阶数,得到的优化矩阵为50行×50列的矩阵。
对于本申请实施例,为了避免分块至两个1×1的矩阵相乘,影响运算速度,步骤202b之后,还可以包括:判断所述分块矩阵的行数、列数是否小于预设分块阈值;若小于所述预设分块阈值,则按照第二阶数为单位对所述人脸数据矩阵的进行分块,并将确定出的全部分块矩阵作为所述人脸数据矩阵的优化矩阵。
其中,预设分块阈值可以为确定分块的2 i×2 i阶对应的行数与列数,0<i<k,k=1,2,....,7。当分块矩阵的行数、列数小于预设分块阈值时,按照第二接收为单位对人脸数据矩阵进行分块,所述第二阶数可以为行数与列数除以2 i确定的阶数,按照此阶数对人脸数据矩阵进行分块后,并将确定出的全部分块矩阵作为所述人脸数据矩阵的优化矩阵。
对于本申请实施例,为了完成使得到的优化矩阵可以进行乘法运算处理,步骤202b之后的步骤将所述优化矩阵确定为人脸数据矩阵,并重新执行判断根据所述人脸数据构建的 人脸数据矩阵的行数是否超过预设优化阈值的步骤。
本申请提供了另一种数据处理的优化方法,与现有技术按照常用的矩阵与矩阵之间相乘进行代码转换运算处理相比,本申请实施例通过判断出人脸数据矩阵的行数超过预设优化阈值时,对人脸数据矩阵进行降阶处理得到优化矩阵,若判断出人脸数据矩阵的行数未超过预设优化阈值时,利用预置扩展指令对人脸数据矩阵进行和表达式运算处理,以实现减少单纯矩阵相乘时代码迭代运算的耗时,降低占用CPU资源,减少数据处理数量,从而提高数据处理效率。
进一步的,作为对上述图1所示方法的实现,本申请实施例提供了一种数据处理的优化装置,如图3所示,该装置包括:判断模块31、第一处理模块32、第二处理模块33。
判断模块31,用于当根据获取到的人脸数据待进行人脸识别时,截取人脸数据,并判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值;
第一处理模块32,用于若所述人脸数据矩阵的全部行数未超过所述预设优化阈值,则调用预置扩展指令对所述人脸数据矩阵进行整合处理;
第二处理模块33,用于若所述人脸数据矩阵的任意行数超过所述预设优化阈值,则对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵。
本申请提供了一种数据处理的优化装置,与现有技术按照常用的矩阵与矩阵之间相乘进行代码转换运算处理相比,本申请实施例通过判断出人脸数据矩阵的行数超过预设优化阈值时,对人脸数据矩阵进行降阶处理得到优化矩阵,若判断出人脸数据矩阵的行数未超过预设优化阈值时,利用预置扩展指令对人脸数据矩阵进行和表达式运算处理,以实现减少单纯矩阵相乘时代码迭代运算的耗时,降低占用CPU资源,减少数据处理数量,从而提高数据处理效率。
进一步的,作为对上述图2所示方法的实现,本申请实施例提供了另一种数据处理的优化装置,如图4所示,该装置包括:判断模块41、第一处理模块42、第二处理模块43、确定模块44、填充模块45、判断模块46、分块模块47。
判断模块41,用于当根据获取到的人脸数据待进行人脸识别时,截取人脸数据,并判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值;
第一处理模块42,用于若所述人脸数据矩阵的全部行数未超过所述预设优化阈值,则调用预置扩展指令对所述人脸数据矩阵进行整合处理;
第二处理模块43,用于若所述人脸数据矩阵的任意行数超过所述预设优化阈值,则对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵。
进一步地,所述装置还包括:
确定模块44,用于将所述优化矩阵确定为人脸数据矩阵,并重新执行判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值的步骤。
进一步地,所述第二处理模块43,具体用于对所述人脸数据矩阵进行分块,并将包含有所述分块矩阵的人脸数据矩阵作为优化矩阵。
进一步地,所述装置还包括:
填充模块45,用于对所述人脸数据矩阵中属于奇数的行数、列数进行数据填充。
进一步地,所述第二处理模块43,具体用于按照第一阶数为单位对所述人脸数据矩阵的进行分块,并将确定出的全部分块矩阵作为所述人脸数据矩阵的优化矩阵。
进一步地,所述第装置还包括:
判断模块46,用于判断所述分块矩阵的行数、列数是否小于预设分块阈值;
分块模块47,用于若小于所述预设分块阈值,则按照第二阶数为单位对所述人脸数据矩阵的进行分块,并将确定出的全部分块矩阵作为所述人脸数据矩阵的优化矩阵。
进一步地,所述第一处理模块42,具体用于利用预置扩展指令对所述人脸数据矩阵中包含的全部分块矩阵进行和表达式的递归运算处理,得到所述人脸数据矩阵的矩阵运算结果,所述和表达式为将多个分块矩阵进行乘积求和的算法。
本申请提供了另一种数据处理的优化装置,与现有技术按照常用的矩阵与矩阵之间相乘进行代码转换运算处理相比,本申请实施例通过判断出人脸数据矩阵的行数超过预设优化阈值时,对人脸数据矩阵进行降阶处理得到优化矩阵,若判断出人脸数据矩阵的行数未超过预设优化阈值时,利用预置扩展指令对人脸数据矩阵进行和表达式运算处理,以实现减少单纯矩阵相乘时代码迭代运算的耗时,降低占用CPU资源,减少数据处理数量,从而提高数据处理效率。
根据本申请一个实施例提供了一种计算机可读存储介质,其中,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质存储有至少一可执行指令,所述可执行指令使处理器执行以下操作:
当根据获取到的人脸数据待进行人脸识别时,截取人脸数据,并判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值;
若所述人脸数据矩阵的全部行数未超过所述预设优化阈值,则调用预置扩展指令对所述人脸数据矩阵进行整合处理;
若所述人脸数据矩阵的任意行数超过所述预设优化阈值,则对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵。
图5示出了根据本申请一个实施例提供的一种计算机设备的结构示意图,本申请具体实施例并不对计算机设备的具体实现做限定。
如图5所示,该计算机设备可以包括:处理器(processor)502、通信接口(Communications Interface)504、存储器(memory)506、以及通信总线508。
其中:处理器502、通信接口504、以及存储器506通过通信总线508完成相互间的通信。
通信接口504,用于与其它设备比如客户端或其它服务器等的网元通信。
处理器502,用于执行程序510,具体可以执行上述数据处理的优化方法实施例中的相关步骤。
具体地,程序510可以包括程序代码,该程序代码包括计算机操作指令。
处理器502可能是中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本申请实施例的一个或多个集成电路。计算机设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。
存储器506,用于存放程序510。存储器506可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
程序510具体可以用于使得处理器502执行以下操作:
当根据获取到的人脸数据待进行人脸识别时,截取人脸数据,并判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值;
若所述人脸数据矩阵的全部行数未超过所述预设优化阈值,则调用预置扩展指令对所述人脸数据矩阵进行整合处理;
若所述人脸数据矩阵的任意行数超过所述预设优化阈值,则对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵。
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、 等同替换、改进等,均应包括在本申请的保护范围之内。

Claims (20)

  1. 一种数据处理的优化方法,其中,包括:
    当根据获取到的人脸数据待进行人脸识别时,截取人脸数据,并判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值;
    若所述人脸数据矩阵的全部行数未超过所述预设优化阈值,则调用预置扩展指令对所述人脸数据矩阵进行整合处理;
    若所述人脸数据矩阵的任意行数超过所述预设优化阈值,则对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵。
  2. 根据权利要求1所述的方法,其中,所述对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵之后,所述方法还包括:
    将所述优化矩阵确定为人脸数据矩阵,并重新执行判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值的步骤。
  3. 根据权利要求2所述的方法,其中,所述对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵包括:
    对所述人脸数据矩阵进行分块,并将包含有所述分块矩阵的人脸数据矩阵作为优化矩阵。
  4. 根据权利要求3所述的方法,其中,所述对所述人脸数据矩阵进行分块,并将包含有所述分块矩阵的人脸数据矩阵作为优化矩阵之前,所述方法还包括:
    对所述人脸数据矩阵中属于奇数的行数、列数进行数据填充。
  5. 根据权利要求4所述的方法,其中,所述对所述人脸数据矩阵进行分块,并将包含有所述分块矩阵的人脸数据矩阵作为优化矩阵包括:
    按照第一阶数为单位对所述人脸数据矩阵的进行分块,并将确定出的全部分块矩阵作为所述人脸数据矩阵的优化矩阵。
  6. 根据权利要求5所述的方法,其中,所述对所述人脸数据矩阵进行分块,并将包含有所述分块矩阵的人脸数据矩阵作为优化矩阵之后,所述方法还包括:
    判断所述分块矩阵的行数、列数是否小于预设分块阈值;
    若小于所述预设分块阈值,则按照第二阶数为单位对所述人脸数据矩阵的进行分块,并将确定出的全部分块矩阵作为所述人脸数据矩阵的优化矩阵。
  7. 根据权利要求1-6任一项所述的方法,其中,所述调用预置扩展指令对所述人脸数据矩阵进行整合处理包括:
    利用预置扩展指令对所述人脸数据矩阵中包含的全部分块矩阵进行和表达式的递归运算处理,得到所述人脸数据矩阵的矩阵运算结果,所述和表达式为将多个分块矩阵进行乘积求和的算法。
  8. 一种数据处理的优化装置,其中,包括:
    判断模块,用于判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值;
    第一处理模块,用于若所述人脸数据矩阵的全部行数未超过所述预设优化阈值,则调用预置扩展指令对所述人脸数据矩阵进行整合处理;
    第二处理模块,用于若所述人脸数据矩阵的任意行数超过所述预设优化阈值,则对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵。
  9. 一种计算机可读存储介质,所述存储介质中存储有至少一可执行指令,其中,所述可执行指令使处理器执行以下操作:
    当根据获取到的人脸数据待进行人脸识别时,截取人脸数据,并判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值;
    若所述人脸数据矩阵的全部行数未超过所述预设优化阈值,则调用预置扩展指令对所述人脸数据矩阵进行整合处理;
    若所述人脸数据矩阵的任意行数超过所述预设优化阈值,则对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵。
  10. 根据权利要求9所述的存储介质,其中,在所述对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵之后,所述可执行指令还使处理器执行以下操作:
    将所述优化矩阵确定为人脸数据矩阵,并重新执行判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值的步骤。
  11. 根据权利要求10所述的存储介质,其中,在执行所述对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵的操作时,具体包括:
    对所述人脸数据矩阵进行分块,并将包含有所述分块矩阵的人脸数据矩阵作为优化矩阵。
  12. 根据权利要求11所述的存储介质,其中,在所述对所述人脸数据矩阵进行分块,并将包含有所述分块矩阵的人脸数据矩阵作为优化矩阵之前,所述可执行指令还使处理器执行以下操作:
    对所述人脸数据矩阵中属于奇数的行数、列数进行数据填充。
  13. 根据权利要求12所述的存储介质,其中,在执行所述对所述人脸数据矩阵进行分块,并将包含有所述分块矩阵的人脸数据矩阵作为优化矩阵的操作时,具体包括:
    按照第一阶数为单位对所述人脸数据矩阵的进行分块,并将确定出的全部分块矩阵作为所述人脸数据矩阵的优化矩阵。
  14. 根据权利要求13所述的存储介质,其中,在所述对所述人脸数据矩阵进行分块,并将包含有所述分块矩阵的人脸数据矩阵作为优化矩阵之后,所述可执行指令还使处理器执行以下操作:
    判断所述分块矩阵的行数、列数是否小于预设分块阈值;
    若小于所述预设分块阈值,则按照第二阶数为单位对所述人脸数据矩阵的进行分块,并将确定出的全部分块矩阵作为所述人脸数据矩阵的优化矩阵。
  15. 根据权利要求9-14任一项所述的存储介质,其中,在执行所述调用预置扩展指令对所述人脸数据矩阵进行整合处理的操作时,具体包括:
    利用预置扩展指令对所述人脸数据矩阵中包含的全部分块矩阵进行和表达式的递归运算处理,得到所述人脸数据矩阵的矩阵运算结果,所述和表达式为将多个分块矩阵进行乘积求和的算法。
  16. 一种计算机设备,包括:处理器、存储器、通信接口和通信总线,所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信;
    其中,所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行以下操作:
    当根据获取到的人脸数据待进行人脸识别时,截取人脸数据,并判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值;
    若所述人脸数据矩阵的全部行数未超过所述预设优化阈值,则调用预置扩展指令对所述人脸数据矩阵进行整合处理;
    若所述人脸数据矩阵的任意行数超过所述预设优化阈值,则对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵。
  17. 根据权利要求16所述的计算机设备,其中,在所述对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵之后,所述可执行指令使所述处理器还执行以下操作:
    将所述优化矩阵确定为人脸数据矩阵,并重新执行判断根据所述人脸数据构建的人脸数据矩阵的行数是否超过预设优化阈值的步骤。
  18. 根据权利要求17所述的计算机设备,其中,在执行所述对所述人脸数据矩阵进行降阶处理,得到人脸数据矩阵的优化矩阵的操作时,具体包括:
    对所述人脸数据矩阵进行分块,并将包含有所述分块矩阵的人脸数据矩阵作为优化矩 阵。
  19. 根据权利要求18所述的计算机设备,其中,在所述对所述人脸数据矩阵进行分块,并将包含有所述分块矩阵的人脸数据矩阵作为优化矩阵之前,所述可执行指令还使处理器执行以下操作:
    对所述人脸数据矩阵中属于奇数的行数、列数进行数据填充。
  20. 根据权利要求19所述的计算机设备,其中,在执行所述对所述人脸数据矩阵进行分块,并将包含有所述分块矩阵的人脸数据矩阵作为优化矩阵的操作时,具体包括:
    按照第一阶数为单位对所述人脸数据矩阵的进行分块,并将确定出的全部分块矩阵作为所述人脸数据矩阵的优化矩阵。
PCT/CN2020/093173 2020-03-03 2020-05-29 数据处理的优化方法及装置、存储介质、计算机设备 WO2021174691A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010138933.2A CN111507178B (zh) 2020-03-03 2020-03-03 数据处理的优化方法及装置、存储介质、计算机设备
CN202010138933.2 2020-03-03

Publications (1)

Publication Number Publication Date
WO2021174691A1 true WO2021174691A1 (zh) 2021-09-10

Family

ID=71868982

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093173 WO2021174691A1 (zh) 2020-03-03 2020-05-29 数据处理的优化方法及装置、存储介质、计算机设备

Country Status (2)

Country Link
CN (1) CN111507178B (zh)
WO (1) WO2021174691A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110135167A1 (en) * 2008-07-10 2011-06-09 Nec Corporation Personal authentication system and personal authentication method
CN104572587A (zh) * 2014-12-23 2015-04-29 中国电子科技集团公司第三十八研究所 数据矩阵相乘的加速运算方法和装置
CN108171327A (zh) * 2017-12-25 2018-06-15 郑州云海信息技术有限公司 一种基于卷积运算的矩阵转化方法、装置及介质
CN109542512A (zh) * 2018-11-06 2019-03-29 腾讯科技(深圳)有限公司 一种数据处理方法、装置和存储介质
CN110766133A (zh) * 2019-09-18 2020-02-07 开放智能机器(上海)有限公司 嵌入式设备中的数据处理方法、装置、设备和存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07249124A (ja) * 1994-03-11 1995-09-26 Dainippon Printing Co Ltd Idカード及びidカードシステム
CN102254193A (zh) * 2011-07-16 2011-11-23 西安电子科技大学 基于相关向量机的多类数据分类方法
CN105894047B (zh) * 2016-06-28 2019-08-27 深圳市唯特视科技有限公司 一种基于三维数据的人脸分类系统
CN107742150B (zh) * 2016-10-31 2020-05-12 腾讯科技(深圳)有限公司 一种卷积神经网络的数据处理方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110135167A1 (en) * 2008-07-10 2011-06-09 Nec Corporation Personal authentication system and personal authentication method
CN104572587A (zh) * 2014-12-23 2015-04-29 中国电子科技集团公司第三十八研究所 数据矩阵相乘的加速运算方法和装置
CN108171327A (zh) * 2017-12-25 2018-06-15 郑州云海信息技术有限公司 一种基于卷积运算的矩阵转化方法、装置及介质
CN109542512A (zh) * 2018-11-06 2019-03-29 腾讯科技(深圳)有限公司 一种数据处理方法、装置和存储介质
CN110766133A (zh) * 2019-09-18 2020-02-07 开放智能机器(上海)有限公司 嵌入式设备中的数据处理方法、装置、设备和存储介质

Also Published As

Publication number Publication date
CN111507178A (zh) 2020-08-07
CN111507178B (zh) 2024-05-14

Similar Documents

Publication Publication Date Title
CN108133270B (zh) 卷积神经网络加速方法及装置
US11720523B2 (en) Performing concurrent operations in a processing element
US10943167B1 (en) Restructuring a multi-dimensional array
CN108388537B (zh) 一种卷积神经网络加速装置和方法
US20170061279A1 (en) Updating an artificial neural network using flexible fixed point representation
WO2017185414A1 (zh) 一种支持较少位数浮点数的神经网络运算的装置和方法
WO2022037257A1 (zh) 卷积计算引擎、人工智能芯片以及数据处理方法
WO2024027039A1 (zh) 数据处理方法、装置、设备和可读存储介质
JPH05204605A (ja) データ処理方法及びその装置
US10152310B2 (en) Fusing a sequence of operations through subdividing
CN114792132A (zh) 一种脉冲神经网络加速计算系统、方法、设备及介质
WO2022028232A1 (zh) 执行lstm神经网络运算的装置和方法
WO2022068328A1 (zh) 数据迁移的方法、装置、处理器和计算设备
CN108764182B (zh) 一种优化的用于人工智能的加速方法和装置
WO2021174691A1 (zh) 数据处理的优化方法及装置、存储介质、计算机设备
US20220066819A1 (en) Methods and apparatuses for coalescing function calls for ray-tracing
CN109800867B (zh) 一种基于fpga片外存储器的数据调用方法
US8842121B2 (en) Stream compaction for rasterization
CN108255463B (zh) 一种数字逻辑运算方法、电路和fpga芯片
CN115346099A (zh) 基于加速器芯片的图像卷积方法、芯片、设备及介质
CN111124626A (zh) 一种众核系统及其数据处理方法和处理装置
Bhagavathi et al. Square meshes are not optimal for convex hull computation
CN110163793B (zh) 卷积计算加速方法和装置
JP7124608B2 (ja) 計算機および計算方法
CN112668709A (zh) 计算装置以及用于数据重用的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20923231

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20923231

Country of ref document: EP

Kind code of ref document: A1