CN108764037B - Face detection implementation method based on ARM Cotex-A series platform - Google Patents
Face detection implementation method based on ARM Cotex-A series platform Download PDFInfo
- Publication number
- CN108764037B CN108764037B CN201810372936.5A CN201810372936A CN108764037B CN 108764037 B CN108764037 B CN 108764037B CN 201810372936 A CN201810372936 A CN 201810372936A CN 108764037 B CN108764037 B CN 108764037B
- Authority
- CN
- China
- Prior art keywords
- function
- neon
- arm
- instruction
- epi32
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
Abstract
The invention discloses a face detection implementation method based on an ARM Cotex-A series platform, which comprises the following steps: s1, modifying the source code in the faceDetection of setaface under the hardware environment of ARM Cotex-A series processor, and modifying the type of the compiler into a cross compiler; s2, adding NEON compiling options in the setting of a compiler; s3, replacing the original header file required by the SSE instruction in the faceDetection with the header file required by the NEON; s4, modifying the part of the original code aiming at the faceDetection, which uses the SSE instruction, into an NEON instruction, and modifying the function using the SSE instruction into a function using the NEON; s5, recompiling the program under the support of the compiling option of the NEON added in the step S2 to obtain the required dynamic link library file, thereby compiling the faceDetection program supporting the NEON under the ARM Cotex-A series processor platform. The invention can effectively improve the efficiency of face detection by using setaface in hardware using a Cotex-A series processor.
Description
Technical Field
The invention relates to the field of computer image processing, in particular to a face detection implementation method of a Cotex-A series platform based on ARM.
Background
The existing face detection has certain requirements on image processing equipment (hardware), so the face detection module of setaface is mostly applied to an x86 platform at present, and the detection process is shown in fig. 1. The ARM processor is mostly used in mobile devices and embedded devices, and such hardware balances energy consumption and performance, so the performance is not very high, but limited by performance, and the efficiency of face detection is not high in these hardware environments.
SIMD single instruction stream Multiple Data (SIMD) is a technique that uses one controller to control Multiple processors while performing the same operation on each of a set of Data (also called "Data vectors") separately to achieve spatial parallelism. In a microprocessor, the SIMD technology is a controller that controls multiple parallel processing elements.
Different processors implement SIMD using different approaches, e.g., Intel processors may use SSE (Single instruction multiple data stream) and ARM platforms by using NEON extensionsAnd (5) unfolding the structure. The NEON technique is ARM CortexTMA 128-bit SIMD (single instruction, multiple data) architecture extension of the a-series processor, aimed at providing flexible, powerful acceleration functionality for consumer multimedia applications, thereby significantly improving the user experience. It has 32 registers, 64 bits wide (16 registers, 128 bits wide in double view).
The main problems with the use of setaface today are: when the setaface is used for face detection, SSE instruction acceleration can be used under an x86 platform, but the method is limited by an instruction set under an ARM platform, and an original acceleration method cannot be used, so that the detection efficiency is low.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a face detection implementation method based on a Cotex-A7 platform, which effectively improves the efficiency of face detection by using setaface in hardware using a Cotex-A series processor.
In order to achieve the purpose, the invention adopts the following technical scheme:
a face detection implementation method based on an ARM Cotex-A series platform comprises the following steps:
s1, modifying the source code in the faceDetection of setaface under the hardware environment of ARM Cotex-A series processor, and modifying the type of the compiler into a cross compiler;
s2, adding NEON compiling options in the setting of a compiler;
s3, replacing the original header file required by the SSE instruction in the faceDetection with the header file required by the NEON;
s4, modifying the part of the original code aiming at the faceDetection, which uses the SSE instruction, into an NEON instruction, and modifying the function using the SSE instruction into a function using the NEON;
s5, recompiling the program under the support of the compiling option of the NEON added in the step S2 to obtain the required dynamic link library file, thereby compiling the faceDetection program supporting the NEON under the ARM Cotex-A series processor platform.
It should be noted that, the specific operations in step S1 are:
modify the SET command:
1) setting the system type, selecting to use linux:
SET(CMAKE_SYSTEM_NAME Linux)
2) setting a cross compiler path: the cross-compiler is enabled and adds the cross-compiler's path:
SET(CMAKE_CXX_COMPILER
"/opt/hisi-linux/x86-arm/arm-hisiv400-linux/bin/arm-hisiv400-linux-gnueabi-g++")。
it should be noted that, the specific operations in step S2 are:
txt in cmakelist. txt, enable instruction dependent settings, enable NEON, modify (increase) the compilation option of NEON in the compiler option setting in set command:
-mfloat-abi=softfp-mfpu=neon。
it should be noted that, the specific operations in step S3 are:
replacing a header file immittin.h required by an original SSE instruction, and replacing the header file immittin.h by a function implementation and header file required by a NEON instruction, wherein the function implementation and header file comprise SseToNeon.h and a NEON instruction header file arm _ neo.h, and the NEON function implementation required by the project is included.
It should be noted that the specific process of step S4 is as follows:
converting the original SSE instruction into a neon instruction under an arm instruction set;
firstly, replacing the original SSE code in the code; the functions in the code that use SSE instructions are as follows:
_mm_add_epi32(__m128i a,__m128i b)——①;
_mm_sub_epi32(__m128i a,__m128i b)——②;
_mm_mullo_epi32(__m128i a,__m128i b)——③;
_mm_mul_ps(__m128i a,__m128i b)——④;
_mm_cmpgt_ps(__m128a,__m128b)——⑤;
_mm_set_epi32(int i3,int i2,int i1,int i0)——⑥;
wherein:
the function of the function _ mm _ add _ epi32() is to complete the addition of 4 32-bit integer numbers at a time and return the addition result, and the alternative function of the function (r) is: vaddq _ s32(a, b); the function prototype of vaddq _ s32() is int32x4_ t vaddq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector calculation under the arm instruction set, the function is the same as _ mm _ add _ epi32 ();
the function of the function _ mm _ sub _ epi32() is to perform the subtraction of 4 32-bit integer numbers at a time and return the addition result, and the alternative function of the function (c) is: vsubq _ s32(a, b); the function prototype of vsubq _ s32() is int32x4_ t vsubq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector calculation under the arm instruction set, the function is the same as _ mm _ sub _ epi32 ();
the function of the _ mm _ mullo _ epi32() is to complete the multiplication of 4 32-bit integer numbers at a time and return the addition result; the replacement function of function (c) is vmulq _ s32(a, b); the function prototype of vmulq _ s32() is int32x4_ t vmulq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector computations under the arm instruction set, the function is the same as _ mm _ mullo _ epi32 ();
the function of the _mm _ mul _ ps () function is to complete the multiplication of 4 32-bit integer numbers at a time and return the addition result; for function iv, returning to a register at __ m128, the specific function is implemented as follows:
the function of the _ mm _ cmpgt _ ps () function is compare greater; the replacement function for function (c) is (__ m128) vcreq _ f32(a, b); the function prototype of vcreq _ f32() was float32x4_ t vcreq _ f32(float32x4_ t __ a, float32x4_ t __ b); for vector computation under the arm instruction set, the function is the same as _ mm _ cmple _ ps ();
the _ mm _ set _ epi32() function is to set 4 signed 32-bit integer values; the alternative function of function sixthly is: vrenterpretq _ m128i _ s32(vld1q _ s32 (data));
wherein the type of return value is defined in the macro definition as follows:
the invention has the beneficial effects that: the method of the invention modifies the FaceDetection of setaface, can realize the instruction acceleration by using NEON instruction supported by ARM platform, accelerates the vector calculation part in the code, accelerates the program operation, and improves the efficiency of face detection by using setaface.
Drawings
Fig. 1 is a schematic diagram of a process of using setaface to perform face detection;
FIG. 2 is a schematic flow chart of an embodiment of the present invention;
FIG. 3 is a flow chart of adding NEON according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings, and it should be noted that the following examples are provided to illustrate the detailed embodiments and specific operations based on the technical solutions of the present invention, but the scope of the present invention is not limited to the examples.
As shown in fig. 2, a face detection implementation method based on an ARM Cotex-a series platform includes the following steps:
s1, modifying the source code in the faceDetection of setaface under the hardware environment of ARM Cotex-A series processor, and modifying the type of the compiler into a cross compiler;
s2, adding NEON compiling options in the setting of a compiler;
s3, replacing the original header file required by the SSE instruction in the faceDetection with the header file required by the NEON;
s4, modifying the part of the original code aiming at the faceDetection, which uses the SSE instruction, into an NEON instruction, and modifying the function using the SSE instruction into a function using the NEON;
s5, recompiling the program under the support of the compiling option of the NEON added in the step S2 to obtain the required dynamic link library file, thereby compiling the faceDetection program supporting the NEON under the ARM Cotex-A series processor platform.
Examples
Step S1, modifying the source code in the faceDetection of setaface under the ARM Cotex-A series processor hardware environment, wherein the type of the modified compiler is a cross compiler:
the SET command is modified so that the SET command,
SET(CMAKE_SYSTEM_NAME Linux)————①
SET(CMAKE_CXX_COMPILER
"/opt/hisi-linux/x86-arm/arm-hisiv400-linux/bin/arm-hisiv400-linux-gnueabi-g++")————②
the method comprises the following steps: the system type is set. Choose to use linux, must make this setting with the cross compiler;
secondly, the step of: a cross compiler path is set. Starting a cross compiler and adding a path of the cross compiler;
step S2, add the compile option of NEON in the compiler setting:
txt in cmakelist. txt the enable instruction dependent setting is modified, nenon is enabled, the compilation option of nenon is modified (increased) in the compiler option setting in the set command.
-mfloat-abi=softfp-mfpu=neon。
Step S3, replacing the header file required by the original SSE instruction in facedetect with the header file required by the NEON:
replacing a header file immittin.h required by an original SSE instruction, and replacing the header file immittin.h by a function implementation and header file required by a NEON instruction, wherein the function implementation and header file comprise SseToNeon.h and a NEON instruction header file arm _ neo.h, and the NEON function implementation required by the project is included.
Step S4, modifying the part of the source code for FaceDetection using the SSE instruction into an NEON instruction, and modifying the function using the SSE instruction into a function using NEON:
converting the original SSE instruction into a neon instruction under an arm instruction set;
the code that originally used the SSE is replaced in the code. The functions in the code that use SSE instructions are as follows:
_mm_add_epi32(__m128i a,__m128i b)——①
_mm_sub_epi32(__m128i a,__m128i b)——②
_mm_mullo_epi32(__m128i a,__m128i b)——③
_mm_mul_ps(__m128i a,__m128i b)——④
_mm_cmpgt_ps(__m128a,__m128b)——⑤
_mm_set_epi32(int i3,int i2,int i1,int i0)——⑥
the method comprises the following steps: the function of the _ mm _ add _ epi32() function is to complete the addition of 4 32-bit integer numbers at a time and return the addition result. The replacement function is: vaddq _ s32(a, b);
wherein the function prototype of vaddq _ s32() is int32x4_ t vaddq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector computations under the arm instruction set, the function is the same as _ mm _ add _ epi32 ().
Secondly, the step of: the function of the _ mm _ sub _ epi32() function is to complete the subtraction of 4 32-bit integer numbers at a time and return the addition result. The replacement function is: vsubq _ s32(a, b);
wherein the function prototype of vsubq _ s32() is int32x4_ t vsubq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector computation under the arm instruction set, the function is the same as _ mm _ sub _ epi32 ().
③ mm _ mullo _ epi32() function is to complete the multiplication of 4 32-bit integer numbers at a time and return the addition result.
The replacement function is vmulq _ s32(a, b);
wherein the function prototype of vmulq _ s32() is int32x4_ t vmulq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector computations under the arm instruction set, the function is the same as _ mm _ mullo _ epi32 ().
The function of the (mm _ mul _ ps () function is to complete the multiplication of 4 32-bit integer numbers at a time and return the addition result. Returning to the register at __ m128, the specific function is implemented as follows:
fifthly: the function of the _ mm _ cmpgt _ ps () function is compare greater.
The substitution function is (__ m128) vcreq _ f32(a, b);
wherein the function prototype of vcreq _ f32() is float32x4_ t vcreq _ f32(float32x4_ t __ a, float32x4_ t __ b); for vector computation under the arm instruction set, the function is the same as _ mm _ cmple _ ps ().
Sixthly, the function of the _ mm _ set _ epi32() is to set 4 signed 32-bit integer values.
The replacement function is: vrenterpretq _ m128i _ s32(vld1q _ s32 (data));
wherein the type of return value is defined in the macro definition as follows:
and (3) performance testing:
the test platform selects an ARM Cotex-a7 series, the detection object is an image of pixels 120X120 and 1280X720, and the operation result pair is shown in table 1:
TABLE 1
It can be seen that through the operation processing of the above mentioned implementation method for face detection based on the Cotex-a series platform, finally the FaceDetection of setaface under the ARM Cotex-a series platform can be used, and the efficiency is kept high.
Example 2
As shown in fig. 3, when the method of the present invention is used, the feature point processing is performed on the input image, and it is determined whether the vector operation is necessary, and if necessary, the code is modified according to the method of embodiment 1.
Various corresponding changes and modifications can be made by those skilled in the art based on the above technical solutions and concepts, and all such changes and modifications should be included in the protection scope of the present invention.
Claims (3)
1. A face detection implementation method based on an ARM Cotex-A series platform is characterized by comprising the following steps:
s1, modifying the source code in the faceDetection of setaface under the hardware environment of ARM Cotex-A series processor, and modifying the type of the compiler into a cross compiler;
s2, adding NEON compiling options in the setting of a compiler;
s3, replacing the original header file required by the SSE instruction in the faceDetection with the header file required by the NEON;
s4, modifying the part of the original code aiming at the faceDetection, which uses the SSE instruction, into an NEON instruction, and modifying the function using the SSE instruction into a function using the NEON;
s5, recompiling the program under the support of the compiling option of the NEON added in the step S2 to obtain a required dynamic link library file, and compiling to obtain a faceDetection program supporting the NEON under an ARM Cotex-A series processor platform;
the specific operation of step S1 is:
modify the SET command:
1) setting the system type, selecting to use linux:
SET(CMAKE_SYSTEM_NAME Linux)
2) setting a cross compiler path: the cross-compiler is enabled and adds the cross-compiler's path:
SET(CMAKE_CXX_COMPILER"/opt/hisi-linux/x86-arm/arm-hisiv400-linux/bin/arm-hisiv400-linux-gnueabi-g++");
the specific operation of step S4 is:
converting the original SSE instruction into a neon instruction under an arm instruction set;
firstly, replacing the original SSE code in the code; the functions in the code that use SSE instructions are as follows:
_mm_add_epi32(__m128i a,__m128i b)——①;
_mm_sub_epi32(__m128i a,__m128i b)——②;
_mm_mullo_epi32(__m128i a,__m128i b)——③;
_mm_mul_ps(__m128i a,__m128i b)——④;
_mm_cmpgt_ps(__m128 a,__m128 b)——⑤;
_mm_set_epi32(int i3,int i2,int i1,int i0)——⑥;
wherein:
the function of the function _ mm _ add _ epi32() is to complete the addition of 4 32-bit integer numbers at a time and return the addition result, and the alternative function of the function (r) is: vaddq _ s32(a, b); the function prototype of vaddq _ s32() is int32x4_ t vaddq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector calculation under the arm instruction set, the function is the same as _ mm _ add _ epi32 ();
the function of the function _ mm _ sub _ epi32() is to perform the subtraction of 4 32-bit integer numbers at a time and return the addition result, and the alternative function of the function (c) is: vsubq _ s32(a, b); the function prototype of vsubq _ s32() is int32x4_ t vsubq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector calculation under the arm instruction set, the function is the same as _ mm _ sub _ epi32 ();
the function of the _ mm _ mullo _ epi32() is to complete the multiplication of 4 32-bit integer numbers at a time and return the addition result; the replacement function of function (c) is vmulq _ s32(a, b); the function prototype of vmulq _ s32() is int32x4_ t vmulq _ s32(int32x4_ t __ a, int32x4_ t __ b); for vector computations under the arm instruction set, the function is the same as _ mm _ mullo _ epi32 ();
the function of the _mm _ mul _ ps () function is to complete the multiplication of 4 32-bit integer numbers at a time and return the addition result; for function iv, returning to a register at __ m128, the specific function is implemented as follows:
INLINE__m128_mm_mul_ps(__m128 a,__m128 b)
{
__m128 ret;
ret[0]=a[0]*b[0];
ret[1]=a[1]*b[1];
ret[2]=a[2]*b[2];
ret[3]=a[3]*b[3];
return ret;
}
the function of the _ mm _ cmpgt _ ps () function is compare greater; the replacement function for function (c) is (__ m128) vcreq _ f32(a, b); the function prototype of vcreq _ f32() was float32x4_ tvc req _ f32(float32x4_ t __ a, float32x4_ t __ b); for vector computation under the arm instruction set, the function is the same as _ mm _ cmple _ ps ();
the _ mm _ set _ epi32() function is to set 4 signed 32-bit integer values; the alternative function of function sixthly is: vrenterpretq _ m128i _ s32(vld1q _ s32 (data));
wherein the type of return value is defined in the macro definition as follows:
#define_MM_SHUFFLE(z,y,x,w)((z<<6)|(y<<4)|(x<<2)|w)
#define vreinterpretq_m128 i_s32(x)\
(x)
#define vreinterpretq_m128i_u32(x)\
vreinterpretq_s32_u32(x)
#define vreinterpretq_s32_m128i(x)\
(x)。
2. the method for realizing human face detection based on the ARM Cotex-A series platform as claimed in claim 1, wherein the specific operation of the step S2 is as follows:
txt in cmakelist. txt, enable instruction dependent settings, enable NEON, modify (increase) the compilation option of NEON in the compiler option setting in set command:
-mfloat-abi=softfp-mfpu=neon。
3. the method for realizing human face detection based on the ARM Cotex-A series platform as claimed in claim 1, wherein the specific operation of the step S3 is as follows:
replacing a header file immittin.h required by an original SSE instruction, and replacing the header file immittin.h by a function implementation and header file required by a NEON instruction, wherein the function implementation and header file comprise SseToNeon.h and a NEON instruction header file arm _ neo.h, and the NEON function implementation required by the project is included.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810372936.5A CN108764037B (en) | 2018-04-24 | 2018-04-24 | Face detection implementation method based on ARM Cotex-A series platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810372936.5A CN108764037B (en) | 2018-04-24 | 2018-04-24 | Face detection implementation method based on ARM Cotex-A series platform |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108764037A CN108764037A (en) | 2018-11-06 |
CN108764037B true CN108764037B (en) | 2021-12-24 |
Family
ID=64011584
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810372936.5A Active CN108764037B (en) | 2018-04-24 | 2018-04-24 | Face detection implementation method based on ARM Cotex-A series platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108764037B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428872B (en) * | 2019-09-26 | 2020-03-10 | 深圳华大基因科技服务有限公司 | Method and device for converting gene comparison instruction set |
CN113157321B (en) * | 2021-02-05 | 2022-02-08 | 湖南国科亿存信息科技有限公司 | Erasure encoding and decoding method and device based on NEON instruction acceleration under ARM platform |
CN112671618B (en) * | 2021-03-15 | 2021-06-15 | 北京安帝科技有限公司 | Deep packet inspection method and device |
CN113254065B (en) * | 2021-07-14 | 2021-11-02 | 广州易方信息科技股份有限公司 | Application software compatibility method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101957909A (en) * | 2009-07-15 | 2011-01-26 | 青岛科技大学 | Digital signal processor (DSP)-based face detection method |
CN102364433A (en) * | 2011-06-24 | 2012-02-29 | 浙大网新科技股份有限公司 | Method for realizing Wine construction tool transplanting on ARM (Advanced RISC Machines) processor |
CN104463125A (en) * | 2014-12-11 | 2015-03-25 | 哈尔滨工程大学 | DSP-based automatic face detecting and tracking device and method |
CN107016341A (en) * | 2017-03-03 | 2017-08-04 | 西安交通大学 | A kind of embedded real-time face recognition methods |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218251B (en) * | 2013-04-16 | 2016-05-18 | 青岛中星微电子有限公司 | Verification method and the device of multiple nucleus system level chip |
CN105787910B (en) * | 2015-12-24 | 2019-01-11 | 武汉鸿瑞达信息技术有限公司 | A kind of calculation optimization method of the human face region filtering method based on heterogeneous platform |
CN106887059A (en) * | 2017-01-18 | 2017-06-23 | 华南农业大学 | A kind of intelligent electronic lock system based on face recognition |
CN107784289A (en) * | 2017-11-02 | 2018-03-09 | 深圳市共进电子股份有限公司 | A kind of security-protecting and monitoring method, apparatus and system |
CN107909348A (en) * | 2017-11-26 | 2018-04-13 | 常熟安智生物识别技术有限公司 | A kind of personnel system scheme using recognition of face |
CN107934704A (en) * | 2017-11-30 | 2018-04-20 | 常熟安智生物识别技术有限公司 | A kind of terraced control system using recognition of face |
-
2018
- 2018-04-24 CN CN201810372936.5A patent/CN108764037B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101957909A (en) * | 2009-07-15 | 2011-01-26 | 青岛科技大学 | Digital signal processor (DSP)-based face detection method |
CN102364433A (en) * | 2011-06-24 | 2012-02-29 | 浙大网新科技股份有限公司 | Method for realizing Wine construction tool transplanting on ARM (Advanced RISC Machines) processor |
CN104463125A (en) * | 2014-12-11 | 2015-03-25 | 哈尔滨工程大学 | DSP-based automatic face detecting and tracking device and method |
CN107016341A (en) * | 2017-03-03 | 2017-08-04 | 西安交通大学 | A kind of embedded real-time face recognition methods |
Also Published As
Publication number | Publication date |
---|---|
CN108764037A (en) | 2018-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108764037B (en) | Face detection implementation method based on ARM Cotex-A series platform | |
CN108268278B (en) | Processor, method and system with configurable spatial accelerator | |
US7797366B2 (en) | Power-efficient sign extension for booth multiplication methods and systems | |
TWI470543B (en) | Simd integer multiply-accumulate instruction for multi-precision arithmetic | |
US20080109795A1 (en) | C/c++ language extensions for general-purpose graphics processing unit | |
JP6051458B2 (en) | Method and apparatus for efficiently performing multiple hash operations | |
US11531542B2 (en) | Addition instructions with independent carry chains | |
US20120166511A1 (en) | System, apparatus, and method for improved efficiency of execution in signal processing algorithms | |
CN108269226B (en) | Apparatus and method for processing sparse data | |
KR20150112779A (en) | Method and apparatus for performing a plurality of multiplication operations | |
US20080092124A1 (en) | Code generation for complex arithmetic reduction for architectures lacking cross data-path support | |
Hassan et al. | Performance evaluation of matrix-matrix multiplications using Intel's advanced vector extensions (AVX) | |
Wang et al. | Parallel SHA-256 on SW26010 many-core processor for hashing of multiple messages | |
CN115718622A (en) | Data processing method and device under ARM architecture and electronic equipment | |
Li et al. | Efficient AES implementation on Sunway TaihuLight supercomputer: A systematic approach | |
US20140372992A1 (en) | Data processing system and data simulation method in the system | |
Gao et al. | A trigonometric function instruction set extension method based on RISC-V | |
US20080222388A1 (en) | Simulation of processor status flags | |
CN110914800A (en) | Register-based complex processing | |
Pornin | Comparative performance review of the sha-3 second-round candidates | |
CN116089785A (en) | FT2000 < + > based single-precision low-order matrix multiplication block algorithm optimization method and system | |
EP4095698A1 (en) | Processor, simulator program, assembler program, and information processing program | |
Chen et al. | ORC2DSP: Compiler Infrastructure Supports for VLIW DSP Processors | |
You et al. | Designing and implementing a heuristic cross-architecture combination for graph traversal | |
Huang | Enable Advanced Vector Extensions for Libraries: Pros, Cons, and Scenarios |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |