CN108764037B - Face detection implementation method based on ARM Cotex-A series platform - Google Patents

Face detection implementation method based on ARM Cotex-A series platform Download PDF

Info

Publication number
CN108764037B
CN108764037B CN201810372936.5A CN201810372936A CN108764037B CN 108764037 B CN108764037 B CN 108764037B CN 201810372936 A CN201810372936 A CN 201810372936A CN 108764037 B CN108764037 B CN 108764037B
Authority
CN
China
Prior art keywords
function
neon
arm
instruction
epi32
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810372936.5A
Other languages
Chinese (zh)
Other versions
CN108764037A (en
Inventor
洪朝群
王善炮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shishi Senke Intelligent Technology Co ltd
Original Assignee
Shishi Senke Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shishi Senke Intelligent Technology Co ltd filed Critical Shishi Senke Intelligent Technology Co ltd
Priority to CN201810372936.5A priority Critical patent/CN108764037B/en
Publication of CN108764037A publication Critical patent/CN108764037A/en
Application granted granted Critical
Publication of CN108764037B publication Critical patent/CN108764037B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Abstract

The invention discloses a face detection implementation method based on an ARM Cotex-A series platform, which comprises the following steps: s1, modifying the source code in the faceDetection of setaface under the hardware environment of ARM Cotex-A series processor, and modifying the type of the compiler into a cross compiler; s2, adding NEON compiling options in the setting of a compiler; s3, replacing the original header file required by the SSE instruction in the faceDetection with the header file required by the NEON; s4, modifying the part of the original code aiming at the faceDetection, which uses the SSE instruction, into an NEON instruction, and modifying the function using the SSE instruction into a function using the NEON; s5, recompiling the program under the support of the compiling option of the NEON added in the step S2 to obtain the required dynamic link library file, thereby compiling the faceDetection program supporting the NEON under the ARM Cotex-A series processor platform. The invention can effectively improve the efficiency of face detection by using setaface in hardware using a Cotex-A series processor.

Description

Face detection implementation method based on ARM Cotex-A series platform
Technical Field
The invention relates to the field of computer image processing, in particular to a face detection implementation method of a Cotex-A series platform based on ARM.
Background
The existing face detection has certain requirements on image processing equipment (hardware), so the face detection module of setaface is mostly applied to an x86 platform at present, and the detection process is shown in fig. 1. The ARM processor is mostly used in mobile devices and embedded devices, and such hardware balances energy consumption and performance, so the performance is not very high, but limited by performance, and the efficiency of face detection is not high in these hardware environments.
SIMD single instruction stream Multiple Data (SIMD) is a technique that uses one controller to control Multiple processors while performing the same operation on each of a set of Data (also called "Data vectors") separately to achieve spatial parallelism. In a microprocessor, the SIMD technology is a controller that controls multiple parallel processing elements.
Different processors implement SIMD using different approaches, e.g., Intel processors may use SSE (Single instruction multiple data stream) and ARM platforms by using NEON extensionsAnd (5) unfolding the structure. The NEON technique is ARM CortexTMA 128-bit SIMD (single instruction, multiple data) architecture extension of the a-series processor, aimed at providing flexible, powerful acceleration functionality for consumer multimedia applications, thereby significantly improving the user experience. It has 32 registers, 64 bits wide (16 registers, 128 bits wide in double view).
The main problems with the use of setaface today are: when the setaface is used for face detection, SSE instruction acceleration can be used under an x86 platform, but the method is limited by an instruction set under an ARM platform, and an original acceleration method cannot be used, so that the detection efficiency is low.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a face detection implementation method based on a Cotex-A7 platform, which effectively improves the efficiency of face detection by using setaface in hardware using a Cotex-A series processor.
In order to achieve the purpose, the invention adopts the following technical scheme:
a face detection implementation method based on an ARM Cotex-A series platform comprises the following steps:
s1, modifying the source code in the faceDetection of setaface under the hardware environment of ARM Cotex-A series processor, and modifying the type of the compiler into a cross compiler;
s2, adding NEON compiling options in the setting of a compiler;
s3, replacing the original header file required by the SSE instruction in the faceDetection with the header file required by the NEON;
s4, modifying the part of the original code aiming at the faceDetection, which uses the SSE instruction, into an NEON instruction, and modifying the function using the SSE instruction into a function using the NEON;
s5, recompiling the program under the support of the compiling option of the NEON added in the step S2 to obtain the required dynamic link library file, thereby compiling the faceDetection program supporting the NEON under the ARM Cotex-A series processor platform.
It should be noted that, the specific operations in step S1 are:
modify the SET command:
1) setting the system type, selecting to use linux:
SET(CMAKE_SYSTEM_NAME Linux)
2) setting a cross compiler path: the cross-compiler is enabled and adds the cross-compiler's path:
SET(CMAKE_CXX_COMPILER
"/opt/hisi-linux/x86-arm/arm-hisiv400-linux/bin/arm-hisiv400-linux-gnueabi-g++")。
it should be noted that, the specific operations in step S2 are:
txt in cmakelist. txt, enable instruction dependent settings, enable NEON, modify (increase) the compilation option of NEON in the compiler option setting in set command:
-mfloat-abi=softfp-mfpu=neon。
it should be noted that, the specific operations in step S3 are:
replacing a header file immittin.h required by an original SSE instruction, and replacing the header file immittin.h by a function implementation and header file required by a NEON instruction, wherein the function implementation and header file comprise SseToNeon.h and a NEON instruction header file arm _ neo.h, and the NEON function implementation required by the project is included.
It should be noted that the specific process of step S4 is as follows:
converting the original SSE instruction into a neon instruction under an arm instruction set;
firstly, replacing the original SSE code in the code; the functions in the code that use SSE instructions are as follows:
_mm_add_epi32(__m128i a,__m128i b)——①;
_mm_sub_epi32(__m128i a,__m128i b)——②;
_mm_mullo_epi32(__m128i a,__m128i b)——③;
_mm_mul_ps(__m128i a,__m128i b)——④;
_mm_cmpgt_ps(__m128a,__m128b)——⑤;
_mm_set_epi32(int i3,int i2,int i1,int i0)——⑥;
wherein:
the function of the function _ mm _ add _ epi32() is to complete the addition of 4 32-bit integer numbers at a time and return the addition result, and the alternative function of the function (r) is: vaddq _ s32(a, b); the function prototype of vaddq _ s32() is int32x4_ t vaddq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector calculation under the arm instruction set, the function is the same as _ mm _ add _ epi32 ();
the function of the function _ mm _ sub _ epi32() is to perform the subtraction of 4 32-bit integer numbers at a time and return the addition result, and the alternative function of the function (c) is: vsubq _ s32(a, b); the function prototype of vsubq _ s32() is int32x4_ t vsubq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector calculation under the arm instruction set, the function is the same as _ mm _ sub _ epi32 ();
the function of the _ mm _ mullo _ epi32() is to complete the multiplication of 4 32-bit integer numbers at a time and return the addition result; the replacement function of function (c) is vmulq _ s32(a, b); the function prototype of vmulq _ s32() is int32x4_ t vmulq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector computations under the arm instruction set, the function is the same as _ mm _ mullo _ epi32 ();
the function of the _mm _ mul _ ps () function is to complete the multiplication of 4 32-bit integer numbers at a time and return the addition result; for function iv, returning to a register at __ m128, the specific function is implemented as follows:
Figure BDA0001638799050000051
the function of the _ mm _ cmpgt _ ps () function is compare greater; the replacement function for function (c) is (__ m128) vcreq _ f32(a, b); the function prototype of vcreq _ f32() was float32x4_ t vcreq _ f32(float32x4_ t __ a, float32x4_ t __ b); for vector computation under the arm instruction set, the function is the same as _ mm _ cmple _ ps ();
the _ mm _ set _ epi32() function is to set 4 signed 32-bit integer values; the alternative function of function sixthly is: vrenterpretq _ m128i _ s32(vld1q _ s32 (data));
wherein the type of return value is defined in the macro definition as follows:
Figure BDA0001638799050000052
Figure BDA0001638799050000061
the invention has the beneficial effects that: the method of the invention modifies the FaceDetection of setaface, can realize the instruction acceleration by using NEON instruction supported by ARM platform, accelerates the vector calculation part in the code, accelerates the program operation, and improves the efficiency of face detection by using setaface.
Drawings
Fig. 1 is a schematic diagram of a process of using setaface to perform face detection;
FIG. 2 is a schematic flow chart of an embodiment of the present invention;
FIG. 3 is a flow chart of adding NEON according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, and it should be noted that the following examples are provided to illustrate the detailed embodiments and specific operations based on the technical solutions of the present invention, but the scope of the present invention is not limited to the examples.
As shown in fig. 2, a face detection implementation method based on an ARM Cotex-a series platform includes the following steps:
s1, modifying the source code in the faceDetection of setaface under the hardware environment of ARM Cotex-A series processor, and modifying the type of the compiler into a cross compiler;
s2, adding NEON compiling options in the setting of a compiler;
s3, replacing the original header file required by the SSE instruction in the faceDetection with the header file required by the NEON;
s4, modifying the part of the original code aiming at the faceDetection, which uses the SSE instruction, into an NEON instruction, and modifying the function using the SSE instruction into a function using the NEON;
s5, recompiling the program under the support of the compiling option of the NEON added in the step S2 to obtain the required dynamic link library file, thereby compiling the faceDetection program supporting the NEON under the ARM Cotex-A series processor platform.
Examples
Step S1, modifying the source code in the faceDetection of setaface under the ARM Cotex-A series processor hardware environment, wherein the type of the modified compiler is a cross compiler:
the SET command is modified so that the SET command,
SET(CMAKE_SYSTEM_NAME Linux)————①
SET(CMAKE_CXX_COMPILER
"/opt/hisi-linux/x86-arm/arm-hisiv400-linux/bin/arm-hisiv400-linux-gnueabi-g++")————②
the method comprises the following steps: the system type is set. Choose to use linux, must make this setting with the cross compiler;
secondly, the step of: a cross compiler path is set. Starting a cross compiler and adding a path of the cross compiler;
step S2, add the compile option of NEON in the compiler setting:
txt in cmakelist. txt the enable instruction dependent setting is modified, nenon is enabled, the compilation option of nenon is modified (increased) in the compiler option setting in the set command.
-mfloat-abi=softfp-mfpu=neon。
Step S3, replacing the header file required by the original SSE instruction in facedetect with the header file required by the NEON:
replacing a header file immittin.h required by an original SSE instruction, and replacing the header file immittin.h by a function implementation and header file required by a NEON instruction, wherein the function implementation and header file comprise SseToNeon.h and a NEON instruction header file arm _ neo.h, and the NEON function implementation required by the project is included.
Step S4, modifying the part of the source code for FaceDetection using the SSE instruction into an NEON instruction, and modifying the function using the SSE instruction into a function using NEON:
converting the original SSE instruction into a neon instruction under an arm instruction set;
the code that originally used the SSE is replaced in the code. The functions in the code that use SSE instructions are as follows:
_mm_add_epi32(__m128i a,__m128i b)——①
_mm_sub_epi32(__m128i a,__m128i b)——②
_mm_mullo_epi32(__m128i a,__m128i b)——③
_mm_mul_ps(__m128i a,__m128i b)——④
_mm_cmpgt_ps(__m128a,__m128b)——⑤
_mm_set_epi32(int i3,int i2,int i1,int i0)——⑥
the method comprises the following steps: the function of the _ mm _ add _ epi32() function is to complete the addition of 4 32-bit integer numbers at a time and return the addition result. The replacement function is: vaddq _ s32(a, b);
wherein the function prototype of vaddq _ s32() is int32x4_ t vaddq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector computations under the arm instruction set, the function is the same as _ mm _ add _ epi32 ().
Secondly, the step of: the function of the _ mm _ sub _ epi32() function is to complete the subtraction of 4 32-bit integer numbers at a time and return the addition result. The replacement function is: vsubq _ s32(a, b);
wherein the function prototype of vsubq _ s32() is int32x4_ t vsubq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector computation under the arm instruction set, the function is the same as _ mm _ sub _ epi32 ().
③ mm _ mullo _ epi32() function is to complete the multiplication of 4 32-bit integer numbers at a time and return the addition result.
The replacement function is vmulq _ s32(a, b);
wherein the function prototype of vmulq _ s32() is int32x4_ t vmulq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector computations under the arm instruction set, the function is the same as _ mm _ mullo _ epi32 ().
The function of the (mm _ mul _ ps () function is to complete the multiplication of 4 32-bit integer numbers at a time and return the addition result. Returning to the register at __ m128, the specific function is implemented as follows:
Figure BDA0001638799050000091
Figure BDA0001638799050000101
fifthly: the function of the _ mm _ cmpgt _ ps () function is compare greater.
The substitution function is (__ m128) vcreq _ f32(a, b);
wherein the function prototype of vcreq _ f32() is float32x4_ t vcreq _ f32(float32x4_ t __ a, float32x4_ t __ b); for vector computation under the arm instruction set, the function is the same as _ mm _ cmple _ ps ().
Sixthly, the function of the _ mm _ set _ epi32() is to set 4 signed 32-bit integer values.
The replacement function is: vrenterpretq _ m128i _ s32(vld1q _ s32 (data));
wherein the type of return value is defined in the macro definition as follows:
Figure BDA0001638799050000102
and (3) performance testing:
the test platform selects an ARM Cotex-a7 series, the detection object is an image of pixels 120X120 and 1280X720, and the operation result pair is shown in table 1:
TABLE 1
Figure BDA0001638799050000111
It can be seen that through the operation processing of the above mentioned implementation method for face detection based on the Cotex-a series platform, finally the FaceDetection of setaface under the ARM Cotex-a series platform can be used, and the efficiency is kept high.
Example 2
As shown in fig. 3, when the method of the present invention is used, the feature point processing is performed on the input image, and it is determined whether the vector operation is necessary, and if necessary, the code is modified according to the method of embodiment 1.
Various corresponding changes and modifications can be made by those skilled in the art based on the above technical solutions and concepts, and all such changes and modifications should be included in the protection scope of the present invention.

Claims (3)

1. A face detection implementation method based on an ARM Cotex-A series platform is characterized by comprising the following steps:
s1, modifying the source code in the faceDetection of setaface under the hardware environment of ARM Cotex-A series processor, and modifying the type of the compiler into a cross compiler;
s2, adding NEON compiling options in the setting of a compiler;
s3, replacing the original header file required by the SSE instruction in the faceDetection with the header file required by the NEON;
s4, modifying the part of the original code aiming at the faceDetection, which uses the SSE instruction, into an NEON instruction, and modifying the function using the SSE instruction into a function using the NEON;
s5, recompiling the program under the support of the compiling option of the NEON added in the step S2 to obtain a required dynamic link library file, and compiling to obtain a faceDetection program supporting the NEON under an ARM Cotex-A series processor platform;
the specific operation of step S1 is:
modify the SET command:
1) setting the system type, selecting to use linux:
SET(CMAKE_SYSTEM_NAME Linux)
2) setting a cross compiler path: the cross-compiler is enabled and adds the cross-compiler's path:
SET(CMAKE_CXX_COMPILER"/opt/hisi-linux/x86-arm/arm-hisiv400-linux/bin/arm-hisiv400-linux-gnueabi-g++");
the specific operation of step S4 is:
converting the original SSE instruction into a neon instruction under an arm instruction set;
firstly, replacing the original SSE code in the code; the functions in the code that use SSE instructions are as follows:
_mm_add_epi32(__m128i a,__m128i b)——①;
_mm_sub_epi32(__m128i a,__m128i b)——②;
_mm_mullo_epi32(__m128i a,__m128i b)——③;
_mm_mul_ps(__m128i a,__m128i b)——④;
_mm_cmpgt_ps(__m128 a,__m128 b)——⑤;
_mm_set_epi32(int i3,int i2,int i1,int i0)——⑥;
wherein:
the function of the function _ mm _ add _ epi32() is to complete the addition of 4 32-bit integer numbers at a time and return the addition result, and the alternative function of the function (r) is: vaddq _ s32(a, b); the function prototype of vaddq _ s32() is int32x4_ t vaddq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector calculation under the arm instruction set, the function is the same as _ mm _ add _ epi32 ();
the function of the function _ mm _ sub _ epi32() is to perform the subtraction of 4 32-bit integer numbers at a time and return the addition result, and the alternative function of the function (c) is: vsubq _ s32(a, b); the function prototype of vsubq _ s32() is int32x4_ t vsubq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector calculation under the arm instruction set, the function is the same as _ mm _ sub _ epi32 ();
the function of the _ mm _ mullo _ epi32() is to complete the multiplication of 4 32-bit integer numbers at a time and return the addition result; the replacement function of function (c) is vmulq _ s32(a, b); the function prototype of vmulq _ s32() is int32x4_ t vmulq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector computations under the arm instruction set, the function is the same as _ mm _ mullo _ epi32 ();
the function of the _mm _ mul _ ps () function is to complete the multiplication of 4 32-bit integer numbers at a time and return the addition result; for function iv, returning to a register at __ m128, the specific function is implemented as follows:
INLINE__m128_mm_mul_ps(__m128 a,__m128 b)
{
__m128 ret;
ret[0]=a[0]*b[0];
ret[1]=a[1]*b[1];
ret[2]=a[2]*b[2];
ret[3]=a[3]*b[3];
return ret;
}
the function of the _ mm _ cmpgt _ ps () function is compare greater; the replacement function for function (c) is (__ m128) vcreq _ f32(a, b); the function prototype of vcreq _ f32() was float32x4_ tvc req _ f32(float32x4_ t __ a, float32x4_ t __ b); for vector computation under the arm instruction set, the function is the same as _ mm _ cmple _ ps ();
the _ mm _ set _ epi32() function is to set 4 signed 32-bit integer values; the alternative function of function sixthly is: vrenterpretq _ m128i _ s32(vld1q _ s32 (data));
wherein the type of return value is defined in the macro definition as follows:
#define_MM_SHUFFLE(z,y,x,w)((z<<6)|(y<<4)|(x<<2)|w)
#define vreinterpretq_m128 i_s32(x)\
(x)
#define vreinterpretq_m128i_u32(x)\
vreinterpretq_s32_u32(x)
#define vreinterpretq_s32_m128i(x)\
(x)。
2. the method for realizing human face detection based on the ARM Cotex-A series platform as claimed in claim 1, wherein the specific operation of the step S2 is as follows:
txt in cmakelist. txt, enable instruction dependent settings, enable NEON, modify (increase) the compilation option of NEON in the compiler option setting in set command:
-mfloat-abi=softfp-mfpu=neon。
3. the method for realizing human face detection based on the ARM Cotex-A series platform as claimed in claim 1, wherein the specific operation of the step S3 is as follows:
replacing a header file immittin.h required by an original SSE instruction, and replacing the header file immittin.h by a function implementation and header file required by a NEON instruction, wherein the function implementation and header file comprise SseToNeon.h and a NEON instruction header file arm _ neo.h, and the NEON function implementation required by the project is included.
CN201810372936.5A 2018-04-24 2018-04-24 Face detection implementation method based on ARM Cotex-A series platform Active CN108764037B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810372936.5A CN108764037B (en) 2018-04-24 2018-04-24 Face detection implementation method based on ARM Cotex-A series platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810372936.5A CN108764037B (en) 2018-04-24 2018-04-24 Face detection implementation method based on ARM Cotex-A series platform

Publications (2)

Publication Number Publication Date
CN108764037A CN108764037A (en) 2018-11-06
CN108764037B true CN108764037B (en) 2021-12-24

Family

ID=64011584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810372936.5A Active CN108764037B (en) 2018-04-24 2018-04-24 Face detection implementation method based on ARM Cotex-A series platform

Country Status (1)

Country Link
CN (1) CN108764037B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428872B (en) * 2019-09-26 2020-03-10 深圳华大基因科技服务有限公司 Method and device for converting gene comparison instruction set
CN113157321B (en) * 2021-02-05 2022-02-08 湖南国科亿存信息科技有限公司 Erasure encoding and decoding method and device based on NEON instruction acceleration under ARM platform
CN112671618B (en) * 2021-03-15 2021-06-15 北京安帝科技有限公司 Deep packet inspection method and device
CN113254065B (en) * 2021-07-14 2021-11-02 广州易方信息科技股份有限公司 Application software compatibility method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957909A (en) * 2009-07-15 2011-01-26 青岛科技大学 Digital signal processor (DSP)-based face detection method
CN102364433A (en) * 2011-06-24 2012-02-29 浙大网新科技股份有限公司 Method for realizing Wine construction tool transplanting on ARM (Advanced RISC Machines) processor
CN104463125A (en) * 2014-12-11 2015-03-25 哈尔滨工程大学 DSP-based automatic face detecting and tracking device and method
CN107016341A (en) * 2017-03-03 2017-08-04 西安交通大学 A kind of embedded real-time face recognition methods

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218251B (en) * 2013-04-16 2016-05-18 青岛中星微电子有限公司 Verification method and the device of multiple nucleus system level chip
CN105787910B (en) * 2015-12-24 2019-01-11 武汉鸿瑞达信息技术有限公司 A kind of calculation optimization method of the human face region filtering method based on heterogeneous platform
CN106887059A (en) * 2017-01-18 2017-06-23 华南农业大学 A kind of intelligent electronic lock system based on face recognition
CN107784289A (en) * 2017-11-02 2018-03-09 深圳市共进电子股份有限公司 A kind of security-protecting and monitoring method, apparatus and system
CN107909348A (en) * 2017-11-26 2018-04-13 常熟安智生物识别技术有限公司 A kind of personnel system scheme using recognition of face
CN107934704A (en) * 2017-11-30 2018-04-20 常熟安智生物识别技术有限公司 A kind of terraced control system using recognition of face

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957909A (en) * 2009-07-15 2011-01-26 青岛科技大学 Digital signal processor (DSP)-based face detection method
CN102364433A (en) * 2011-06-24 2012-02-29 浙大网新科技股份有限公司 Method for realizing Wine construction tool transplanting on ARM (Advanced RISC Machines) processor
CN104463125A (en) * 2014-12-11 2015-03-25 哈尔滨工程大学 DSP-based automatic face detecting and tracking device and method
CN107016341A (en) * 2017-03-03 2017-08-04 西安交通大学 A kind of embedded real-time face recognition methods

Also Published As

Publication number Publication date
CN108764037A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
CN108764037B (en) Face detection implementation method based on ARM Cotex-A series platform
CN108268278B (en) Processor, method and system with configurable spatial accelerator
US7797366B2 (en) Power-efficient sign extension for booth multiplication methods and systems
TWI470543B (en) Simd integer multiply-accumulate instruction for multi-precision arithmetic
US20080109795A1 (en) C/c++ language extensions for general-purpose graphics processing unit
JP6051458B2 (en) Method and apparatus for efficiently performing multiple hash operations
US11531542B2 (en) Addition instructions with independent carry chains
US20120166511A1 (en) System, apparatus, and method for improved efficiency of execution in signal processing algorithms
CN108269226B (en) Apparatus and method for processing sparse data
KR20150112779A (en) Method and apparatus for performing a plurality of multiplication operations
US20080092124A1 (en) Code generation for complex arithmetic reduction for architectures lacking cross data-path support
Hassan et al. Performance evaluation of matrix-matrix multiplications using Intel's advanced vector extensions (AVX)
Wang et al. Parallel SHA-256 on SW26010 many-core processor for hashing of multiple messages
CN115718622A (en) Data processing method and device under ARM architecture and electronic equipment
Li et al. Efficient AES implementation on Sunway TaihuLight supercomputer: A systematic approach
US20140372992A1 (en) Data processing system and data simulation method in the system
Gao et al. A trigonometric function instruction set extension method based on RISC-V
US20080222388A1 (en) Simulation of processor status flags
CN110914800A (en) Register-based complex processing
Pornin Comparative performance review of the sha-3 second-round candidates
CN116089785A (en) FT2000 &lt; + &gt; based single-precision low-order matrix multiplication block algorithm optimization method and system
EP4095698A1 (en) Processor, simulator program, assembler program, and information processing program
Chen et al. ORC2DSP: Compiler Infrastructure Supports for VLIW DSP Processors
You et al. Designing and implementing a heuristic cross-architecture combination for graph traversal
Huang Enable Advanced Vector Extensions for Libraries: Pros, Cons, and Scenarios

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant