CN117331604A - Front-end data stream processing method, device, terminal and storage medium - Google Patents

Front-end data stream processing method, device, terminal and storage medium Download PDF

Info

Publication number
CN117331604A
CN117331604A CN202311465598.7A CN202311465598A CN117331604A CN 117331604 A CN117331604 A CN 117331604A CN 202311465598 A CN202311465598 A CN 202311465598A CN 117331604 A CN117331604 A CN 117331604A
Authority
CN
China
Prior art keywords
data stream
instruction
table lookup
address
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311465598.7A
Other languages
Chinese (zh)
Inventor
阿西木·约麦尔
梁龙飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai New Helium Brain Intelligence Technology Co ltd
Original Assignee
Shanghai New Helium Brain Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai New Helium Brain Intelligence Technology Co ltd filed Critical Shanghai New Helium Brain Intelligence Technology Co ltd
Priority to CN202311465598.7A priority Critical patent/CN117331604A/en
Publication of CN117331604A publication Critical patent/CN117331604A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3814Implementation provisions of instruction buffers, e.g. prefetch buffer; banks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3818Decoding for concurrent execution

Abstract

The front-end data stream processing method, the front-end data stream processing device, the front-end data stream processing terminal and the front-end data stream processing medium improve the random access capability of table lookup configuration updating and table lookup calculation by arranging the stream data processor at the front end and separating and storing the received data stream and the instruction stream in the nonvolatile memory, and have the characteristics of large capacity and high precision. The input and output of the stream data processor adopts a stream point-to-point bus structure so as to improve the transmission efficiency. And the extremely high-energy-efficiency calculation is promoted by taking the table look-up instruction as the calculation instruction. The adaptability of the processor is improved through the expandable instruction set, the instruction set can freely update operation instructions through the encoding of data streams according to the needs, the reconfigurability of an operation operator is improved, and in addition, the number of instructions required by the processor is reduced due to the efficient and simple expandable custom instruction set in the microprocessor, so that the design of a hardware circuit and the area power consumption are further simplified.

Description

Front-end data stream processing method, device, terminal and storage medium
Technical Field
The present invention relates to the field of data stream processing, and in particular, to a front end data stream processing method, device, terminal, and storage medium.
Background
With the development and popularization of artificial intelligence technology and 5G technology, a large amount of perception data is generated in a data stream mode and is accessed into the Internet of things. The data streams have the requirements of consistency conversion, signal conversion, front-end calculation and the like in the Internet of things system. However, due to the diversity of front-end applications, the complexity and variability of the system environment of the internet of things, and the diversity of application devices and user requirements, the processing of the front-end data also puts higher demands, and at the same time, reconstruction lines are required for the processing of the front-end data.
The resources of the front-end device are usually very limited, and are particularly characterized in terms of computing power, storage capacity, battery life and the like, so that a low-cost, miniaturized and low-power solution is required for data processing by the front-end device, and the front-end device is required to be capable of efficiently processing data under the condition of effective resources while keeping low energy consumption, which is contrary to the process of data stream processing. The processing of data streams typically involves a large number of real-time data streams and requires immediate processing and analysis of the real-time data streams, while requiring the processing system to be able to complete a large number of complex computing tasks in a short time, which is not available in front-end devices of the prior art. In addition, the processing procedure of the data stream also requires a certain computational complexity and flexibility of the processing system. Where computational complexity refers to the need for a processing system to be able to execute complex algorithms and models to extract useful information and insight from the data stream. While flexibility requires that the processing system be able to accommodate different types and variations of data streams and be able to dynamically adjust and optimize as needed. However, these requirements may increase the cost and power consumption of the processing system, which conflicts with the low cost, miniaturization, and low power consumption requirements of the front-end equipment.
The progressive maturity of new nonvolatile memory (NVM) technologies will bring about a significant increase in density capacity and provide new storage schemes. This will lead to higher performance and efficiency for data storage and access, meeting the ever-increasing storage demands. At the same time, the development of NVM technology will also push innovations and advances in memory technology, bringing more possibilities and opportunities for the front-end data processing field.
In the prior art, a zero-power sensor is arranged at a sensor end and is used for filtering effective signals so as to wake up the work of a subsequent front-end system, however, the zero-power sensor is only suitable for filtering signals at a physical level, the application scene is very limited, and the processing pressure of a front-end controller on a data stream cannot be linked. In the prior art, a strategy for accelerating calculation by utilizing a neural network algorithm still cannot solve the problems that front-end equipment does not have reconfigurability and low power consumption.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, an object of the present application is to provide a front-end data stream processing method, apparatus, terminal and storage medium, which are used for solving the problems of high power consumption, high delay, no support for intensive operations and no reconfigurability of the existing front-end device in the process of processing a data stream.
To achieve the above and other related objects, a first aspect of the present application provides a front-end data stream processing method, applied to a front-end processor, the method including: receiving a data stream and an instruction stream for performing table lookup configuration updating operation, and storing the data stream and the instruction stream into a cache area respectively; performing a fetching operation on a first memory based on the data stream to obtain instruction information, and performing a decoding operation on the instruction information; storing the decoding result of the instruction information into an instruction register, and storing the data stream into a general register; performing decoding operation on the decoding result to enable an address generator to generate a corresponding table lookup address according to the decoding result and the data stream; and performing table lookup calculation operation in the second memory based on the table lookup address to obtain a corresponding table lookup calculation result.
In some embodiments of the first aspect of the present application, performing a table look-up calculation operation in the second memory based on the table look-up address to obtain a corresponding table look-up calculation result includes: analyzing the first code in the data stream, and judging whether the table look-up calculation operation can be performed on the data stream according to the analysis result; if the table lookup calculation operation can be performed, acquiring a corresponding table lookup base address based on the data stream; and generating the table lookup address through the address generator based on the table lookup base address, and performing the table lookup calculation operation in the second memory to obtain the table lookup calculation result.
In some embodiments of the first aspect of the present application, the following operations are further performed after the table look-up calculation result is obtained: judging whether the table lookup calculation result is an intermediate result, if so, storing the table lookup calculation result into a cache area, otherwise, outputting the table lookup calculation result; after the table look-up calculation result is stored in the buffer area, judging whether to continue to execute table look-up calculation operation on the data stream; if not, setting the processor to an idle state; if yes, based on the table lookup base address again, generating the table lookup address through the address generator and executing table lookup calculation operation.
In some embodiments of the first aspect of the present application, a method of performing a table lookup configuration update operation includes: updating an operation processing instruction area code in the first memory based on the instruction stream in the buffer area; acquiring a storage state of the second memory, and acquiring a new write-in table lookup base address according to the storage state of the second memory; generating a table lookup address to be updated through the address generator based on the new writing table lookup base address; and writing table configuration information corresponding to the instruction stream into a second memory based on the table lookup address to be updated so as to execute table lookup configuration updating operation.
In some embodiments of the first aspect of the present application, after performing the table look-up configuration update operation, the following operations are further performed: judging whether the table lookup configuration updating operation in the instruction stream is completed or not; if the execution is finished, setting the processor to be in an idle state for running; otherwise, generating a table lookup address to be updated again through the address generator, and executing table lookup configuration updating operation.
In some embodiments of the first aspect of the present application, the instruction information includes a table look-up instruction and a data transmission instruction, wherein the table look-up instruction includes: a table lookup address generation instruction, a table lookup calculation instruction and a table lookup configuration instruction; the data transmission instruction includes: data transfer instructions between the register and the output, data transfer instructions between the register and the input, transfer instructions between the register and the buffer.
In some embodiments of the first aspect of the present application, the instruction information is controlled by an open instruction set processor, wherein the open instruction set processor comprises: an arithmetic logic unit, a shifter and a table look-up storage unit.
To achieve the above and other related objects, a second aspect of the present application provides a front-end data stream processing apparatus. Comprising the following steps: a data stream receiving module: for receiving the data stream and storing it in a buffer; and a table lookup configuration updating module: for receiving an instruction stream for a table lookup configuration update operation and performing the table lookup configuration update operation; and the table look-up calculation module: the data stream is used for executing instruction fetching operation on the first memory to acquire instruction information based on the data stream and executing decoding operation on the instruction information; storing the decoding result of the instruction information after the decoding operation into an instruction register, and storing the data stream into a general register; performing decoding operation on the instruction information after decoding operation, and enabling an address generator to generate a corresponding table lookup address according to the decoding result and the data stream; and performing table lookup calculation operation in the second memory based on the table lookup address to obtain a corresponding table lookup calculation result.
To achieve the above and other related objects, a third aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the front-end data stream processing method.
To achieve the above and other related objects, a fourth aspect of the present application provides an electronic terminal, including: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory so as to enable the terminal to execute the front-end data stream processing method.
As described above, the present application relates to a front-end data stream processing method, device, terminal and storage medium in the field of data stream processing, which has the following beneficial effects: the method realizes the preprocessing of the front-end data stream through a single instruction, thereby reducing the waste of computing resources in the traditional data stream processing, improving the real-time computing capability, ensuring that no pipeline flushing is generated to delay and reducing the delay. The method adopts the nonvolatile memory for storage, has larger storage density and lower operation power consumption, reduces the writing times of the memory, and relieves the limitation of the writing life of the nonvolatile memory. In addition, the instruction in the invention can freely update the operation instruction according to the need through the coding of the data stream, thereby improving the reconfigurability of the operation operator. And the calculation is performed in a table look-up mode, so that the operation cost is greatly reduced, the number of instructions required by the processor is reduced due to the simplicity and regularity of the customized instruction set in the microprocessor, and the design and area power consumption of the hardware circuit are further simplified.
Drawings
Fig. 1 is a schematic flow chart of an embodiment of a front-end data stream processing method of the present application.
Fig. 2 is a schematic structural diagram of a processor in an embodiment of a front-end data stream processing method of the present application.
FIG. 3 is a schematic diagram showing a lookup unit according to an embodiment of the front-end data stream processing method of the present application.
FIG. 4 is a flow chart illustrating the operation of table look-up calculation in an embodiment of the front-end data stream processing method of the present application.
FIG. 5 is a flowchart illustrating a table lookup configuration update operation according to an embodiment of a front-end data stream processing method of the present application.
Fig. 6 is a schematic structural diagram of a mobile device according to an embodiment of the front-end data stream processing method.
Fig. 7 is a schematic structural diagram of an embodiment of a front-end data stream processing device of the present application.
Fig. 8 shows a schematic structural diagram of a front-end data stream processing electronic terminal of the present application.
Detailed Description
Other advantages and effects of the present application will become apparent to those skilled in the art from the present disclosure, when the following description of the embodiments is taken in conjunction with the accompanying drawings. The present application may be embodied or carried out in other specific embodiments, and the details of the present application may be modified or changed from various points of view and applications without departing from the spirit of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It is noted that in the following description, reference is made to the accompanying drawings, which describe several embodiments of the present application. It is to be understood that other embodiments may be utilized and that mechanical, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present application. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. Spatially relative terms, such as "upper," "lower," "left," "right," "lower," "upper," and the like, may be used herein to facilitate a description of one element or feature as illustrated in the figures as being related to another element or feature.
In this application, unless specifically stated and limited otherwise, the terms "mounted," "connected," "secured," "held," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art as the case may be.
Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," and/or "including" specify the presence of stated features, operations, elements, components, items, categories, and/or groups, but do not preclude the presence, presence or addition of one or more other features, operations, elements, components, items, categories, and/or groups. The terms "or" and/or "as used herein are to be construed as inclusive, or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a, A is as follows; b, a step of preparing a composite material; c, performing operation; a and B; a and C; b and C; A. b and C). An exception to this definition will occur only when a combination of elements, functions or operations are in some way inherently mutually exclusive.
In order to solve the problems in the background art, the invention provides a front-end data stream processing method, a device, a terminal and a storage medium, which aim to solve the problems that the existing front-end equipment has high power consumption, high delay, does not support intensive operation and does not have reconfigurability in the process of processing data streams. Meanwhile, in order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be further described in detail by the following examples with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Before explaining the present invention in further detail, terms and terminology involved in the embodiments of the present invention will be explained, and the terms and terminology involved in the embodiments of the present invention are applicable to the following explanation:
the embodiment of the invention provides a front-end data stream processing method, a system of the front-end data stream processing method and a storage medium storing an executable program for realizing the front-end data stream processing method. With respect to implementation of the front-end data stream processing method, an exemplary implementation scenario of front-end data stream processing will be described in the embodiments of the present invention.
<1> data flow: refers to the path and manner in which data flows within a computer system, and may be serial or parallel. The data stream may include input data, output data, and intermediate data.
<2> nonvolatile memory: an apparatus for storing data which can hold stored data even without a power supply. Compared with volatile memories (such as memories), nonvolatile memories (such as hard disks, solid state disks, flash memories, etc.) have the characteristic of persistent storage.
<3> LUT: a look-up table, a data structure for storing and calculating logical functions. The LUT obtains the corresponding output value through the index in the lookup table, and can be used for realizing logic circuits, digital signal processing and other applications.
<4> pipeline structure: a design mode of computer processor divides instruction execution into multiple steps, different instructions can execute different steps simultaneously, thus improving processor efficiency. The pipeline structure comprises stages of fetching, decoding, executing, accessing and writing back, and the like.
<5> finger fetch operation: and reading the instruction from the memory. In a computer, a fetch operation is the first step performed by a processor, which reads instructions from memory and sends them to a pipeline for subsequent processing.
<6> fifo input/output module: a buffer memory for storing and transmitting data is used for reading and writing data according to the first-in first-out principle. FIFO input/output module is used in data transmission, communication, etc. to buffer and synchronize data.
<7> risc-V instruction set: an instruction set architecture based on a Reduced Instruction Set (RISC) for designing and implementing a processor. The RISC-V instruction set is an open instruction set architecture with scalability and flexibility, and is widely used in embedded system and processor designs.
<8> instruction decode operation: a process of converting instructions into executable operations. In a processor, instruction decode operations parse instructions fetched from memory, determine their operation types and operands, and prepare them for subsequent stages of execution.
<9> instruction decode operation: the instruction is translated into an opcode internal to the processor. In an instruction decode operation, the processor matches the opcode of the instruction with the internal instruction set and determines the particular operation to be performed.
<10> first encoding: the first byte of the instruction is used to determine the type of instruction. The first code generally contains information such as the operation type, addressing mode and the like of the instruction, and can help a processor quickly analyze the instruction.
<11> base address: a memory addressing mode accesses data in a memory by adding an offset to a base address. The base address is a memory address, and the offset is added to obtain the memory address to be actually accessed.
<12> operator type: refers to operations in a processor for performing arithmetic and logical operations. Operator types include addition, subtraction, multiplication, division, logical AND, logical OR, and the like.
<13> register File (Regfile): a data structure for storing processor register data. Register File is an important component in a processor for storing and accessing data in registers.
<14> adg (Address Generator), an address generator (module), an address generation module for setting an information module related to a table address in a table look-up operation in a controller, so that a processor can quickly construct an address of the table look-up operation.
<15> ctrl: and the control unit is used for controlling the operation inside the processor. CTRL is responsible for parsing and executing control signals in instructions, controlling the operation of the various functional modules of the processor.
<16> pc: and the program counter is used for storing the address of the next instruction to be executed. The PC is a register in the processor that indicates the location of the current instruction and the address of the next instruction.
<17> if/ID: the first and second stages in the pipeline are used to fetch instructions and pass instructions to the next stage, respectively. IF/ID is two registers in the pipeline for storing and communicating instruction related information.
<18> alu: an arithmetic logic unit for performing arithmetic and logic operations. ALUs are an important component of a processor, and are responsible for performing operations such as addition, subtraction, logic operations, and the like.
<19> shift: and a shifter for performing a displacement operation. Shift can shift data left and right, and is commonly used for processing bits of binary data.
Fig. 1 is a schematic flow chart of a front-end data stream processing method in an embodiment of the invention. It should be noted that the method may be provided in a microprocessor for preprocessing of the front-end aware data stream. FIG. 2 is a schematic diagram of a microprocessor according to an embodiment of the invention. The microprocessor is composed of a processor core, an instruction storage unit, a table look-up storage unit and peripheral equipment. Wherein peripheral means an auxiliary device connected to the microprocessor for providing input, output and memory functions. The instruction storage unit and the table look-up storage unit are respectively stored in two independent nonvolatile memories, and can simultaneously execute read-write operation on the two nonvolatile memories.
It should be noted that, although in the conventional general-purpose processor, the table look-up storage unit is disposed in the main memory instead of the front end, and is accessed through the data cache mechanism. However, there is a problem in that even though the size of the table look-up storage unit is set to be small by the buffer mechanism, since the table look-up calculation operation is very frequent, the power consumption and the buffer size are greatly improved. Meanwhile, the indirect table look-up mode in the prior art also causes the problem of low hit rate. Therefore, the method of directly performing the table look-up calculation at the front end is adopted in the method, so that the efficiency of performing the table look-up calculation on the data stream is improved.
Further, the processor core provided by the invention adopts a simplified three-stage pipeline structure. The pipeline structure includes three processes of fetching, decoding and executing. The instruction fetching stage corresponds to step S12 described below, that is, executes an instruction fetching operation on the first memory based on the data stream to obtain instruction information, and executes a decoding operation on the instruction information. The decoding operation corresponds to the following steps S13 to S14, namely, the decoding result of the instruction information is stored in an instruction register, and the data stream is stored in a general register; and executing decoding operation on the decoding result to enable the address generator to generate a corresponding table lookup address according to the decoding result and the data stream. The execution phase corresponds to a lookup calculation operation and a lookup configuration update operation performed hereinafter.
The front-end data stream processing method in the embodiment is applied to a front-end processor and mainly comprises the following steps:
step S11: and receiving the data stream and the instruction stream for performing table lookup configuration updating operation, and storing the data stream and the instruction stream into the cache area respectively.
In one embodiment of the present invention, as shown in FIG. 2, the external inputs and outputs of the front-end processor use a streaming point-to-point bus interface. The data stream and external instructions are received through the input FIFO module and output through the output FIFO module. Specifically, there are two types of data stream inputs in the front-end processor, one being the input data stream for performing the table look-up calculation, and the other being the instruction stream data for performing the table look-up configuration update operation. The two types of data streams have different input interfaces at the input and are configured with an output data port and a processor state output port at the output.
It should be noted that, in an embodiment of the present invention, the external input and output use a streaming point-to-point bus interface, and the internal command is directly fetched from the input FIFO module or written into the output FIFO module for transmission, and the streaming data transmission path is more efficient than the conventional transmission mode from the processor to the memory.
Step S12: and executing instruction fetching operation on the first memory based on the data stream to acquire instruction information, and executing decoding operation on the instruction information.
In an embodiment of the present invention, the instruction information includes a table look-up instruction and a data transmission instruction, where the table look-up instruction includes: a table lookup address generation instruction, a table lookup calculation instruction and a table lookup configuration instruction; the data transmission instruction includes: data transfer instructions between the register and the output, data transfer instructions between the register and the input, transfer instructions between the register and the buffer.
It should be noted that the first memory is an instruction memory, and the second memory is a table look-up calculation processor. The processor adopts a storage mode of separating data from instructions so as to adapt to the processing of the data stream. In one embodiment of the present invention, the first memory is a non-volatile memory (NVM), which has the advantage that the NVM can directly perform random access, so that the instructions stored therein will not be lost after power-down, and the operation can be resumed at a very fast speed without reloading during power-up. The second memory also adopts a nonvolatile memory, which has the advantages that the nonvolatile memory has random access capability, supports the three-dimensional structure and multi-value storage, can obviously improve the storage capacity of the table look-up calculation parameters, and has extremely sensitive calculation precision to the storage capacity when the table look-up calculation task is completed, the high-capacity storage can realize higher-precision calculation, and the SRAM capacity is too low to meet the requirement of high precision.
Step S13: storing the decoding result of the instruction information into an instruction register, and storing the data stream into a general register.
It should be noted that, the front-end processor in the present invention performs a table look-up calculation operation, which is different from the conventional data stream processing manner. In order to improve the technical problem to be solved by the invention, namely, the preprocessing efficiency of the data stream is improved by a table lookup calculation mode, the instruction expansion and reconstruction (shown in the following table) is carried out by adopting an instruction set expansion RISC-V (reduced instruction set) mode, so that the instruction reconstruction performance is improved without increasing the hardware cost, and the table lookup calculation of the data stream is effectively processed.
Table 1: extended RISC-V instruction set table
Specifically, the extended RISC-V instruction set includes a table lookup address generation instruction, a table lookup instruction, a LUT writing instruction, and the like. Wherein the extended instruction formats include, but are not limited to, R-type I and S-type instruction formats of RV 32I. The instruction "rout.adg.x" represents a table lookup address generation instruction, and when the instruction is used, an address is synthesized by two operand registers and stored in a destination register. And when using R-type instructions, x is a 3-bit encoding in the functional 3-segment in the instruction of the plurt.adg.xRd, rs1, rs2 for distinguishing the generation of different bit-width addresses for parallel or block look-up tables. Illustratively, when both operands are 16 bits, the required LUT size is oversized, where x is set to 4, so that the 16 bits are split into 4 blocks to generate 4 8-bit addresses, which require 4 look-up tables or 4 tables for parallel look-up.
Further, the extended RISC-V instruction set includes data transfer instructions including, but not limited to: data transfer instructions between the register and the output, data transfer instructions between the register and the input, transfer instructions between the register and the buffer. In particular, the reason for setting the data transfer instruction between the register and the buffer is that in performing the table look-up calculation operation, there is a buffer stop in which intermediate data needs to be subjected to one data stream interval in the buffer. Illustratively, for FFT operations, buffering of intermediate data is required for fast data exchanged instructions.
As shown in the extended RISC-V instruction set of the table above, the pl.t instruction and the plu.t instruction represent signed and unsigned numbers of instructions to read data from the cache to the register, respectively; and plr.t instruction is an instruction to input data to a specified register; ps.t instructions and psu.t instructions are instructions for data from registers to caches, and psu.t instructions are instructions for registers to outputs.
Illustratively, the plu.t Rd, rs1, imm instruction is used to read unsigned data from a cache to a register. Generating an address of data reading through Rs1 and imm immediate, namely adding the register value and the immediate; rd is the target write register address, and parameter t represents the chunk of data. Specifically, when t is 2, it means that the read 16-bit data is divided into two blocks, i.e., two results are stored. The partitioning method greatly improves the convenience of parallel transmission of low-bit-width data by taking the multiple of 4 as the transmission bit width, so as to facilitate the transmission of the table lookup data stream.
Step S14: and executing decoding operation on the decoding result to enable the address generator to generate a corresponding table lookup address according to the decoding result and the data stream.
In one embodiment of the invention, the instruction information is controlled by an open instruction set processor, wherein the open instruction set processor comprises: an arithmetic logic unit, a shifter and a table look-up storage unit.
FIG. 3 illustrates a schematic diagram of a single cycle RISC-V processor based data path architecture for an instruction set, in accordance with one embodiment of the present invention. Including a register file, an ALU unit capable of implementing simple logic instructions, a shifter, and a look-up table storage unit. Specifically, for an 8-bit wide input, each lookup bar includes 256 entries, where each entry can be sized to a maximum of 32-bit processor word length. Because of considering the different requirements of the front-end embedded type and the cost and the power consumption in the edge equipment, the number and the bit width of the lookup table can be adjusted according to the actual design requirements.
Step S15: and performing table lookup calculation operation in the second memory based on the table lookup address to obtain a corresponding table lookup calculation result.
In an embodiment of the present invention, the process of performing the table lookup calculation operation in the second memory based on the table lookup address to obtain the corresponding table lookup calculation result includes: analyzing the first code in the data stream, and judging whether the table look-up calculation operation can be performed on the data stream according to the analysis result; if the table lookup calculation operation can be performed, acquiring a corresponding table lookup base address based on the data stream; and generating the table lookup address through the address generator based on the table lookup base address, and performing the table lookup calculation operation in the second memory to obtain the table lookup calculation result. Wherein the base address information is maintained by a preset area in the first memory.
Further, after obtaining the table look-up calculation result, the following operations are further executed: judging whether the table lookup calculation result is an intermediate result, if so, storing the intermediate result of the table lookup calculation into a buffer area, and storing the intermediate result in the buffer area according to a certain alignment sequence. Otherwise, outputting the table look-up calculation result; after the table look-up calculation result is stored in the buffer area, judging whether to continue to execute table look-up calculation operation on the data stream; if not, setting the processor to an idle state; if yes, based on the table lookup base address again, generating the table lookup address through the address generator and executing table lookup calculation operation. The states included in the processor include: an idle state, a table look-up processing state, and an exception processing state.
It should be noted that the table look-up operation performed by the present invention is performed by LUT table look-up instructions, so that extremely energy-efficient computation can be performed on the data stream without using a complex floating point computing circuit. Meanwhile, the instruction function can be directly reconstructed by rewriting codes in the LUT table look-up instruction, and the adaptability of the processor is greatly improved. The processor provided by the invention can update and reconstruct the operation instruction through the first code of the input data.
FIG. 4 is a flow chart illustrating the operation of the lookup calculation according to an embodiment of the invention. The process of the look-up table calculation operation will be described in detail below in conjunction with FIG. 4.
In one embodiment of the invention, when a new data stream is received, the first encoding of the data stream is parsed. Wherein the first code contains operator types that need to perform computation operations on the data stream. After resolving the operator type to perform the calculation operation, the processor searches the operation instruction area of the instruction memory and judges whether the instruction which is consistent with the operator type to perform the calculation operation of the current data stream can be matched. If the corresponding operator is matched, the base address information of the corresponding LUT is acquired. The address generation module then cyclically performs a table look-up operation to generate a table look-up address and performs a table look-up calculation operation based on the input data stream. Specifically, since only one type of data stream is calculated at a time, the subsequent table look-up calculation can calculate a result for each reading period, and sequentially output the results according to the data input order.
FIG. 5 is a flow chart illustrating a table lookup configuration update operation according to an embodiment of the present invention. The process of performing the table lookup configuration update operation will be described in detail below in conjunction with fig. 5.
It should be noted that, when the processor at the front end performs the table lookup calculation operation, the operator corresponding to the data flow to be calculated may not be initialized in the table, or the progress in the table may not reach the calculation requirement. The table look-up configuration information of the processor needs to be updated by an external input instruction stream, i.e. a table look-up configuration update operation is performed. The method comprises the steps of carrying out reconstruction flow on an instruction memory and a table lookup configuration memory, and carrying out reconstruction of corresponding areas according to control state conversion of a processor.
In one embodiment of the present invention, a method for performing a table lookup configuration update operation includes: updating an operation processing instruction area code in the first memory based on the instruction stream in the buffer area; acquiring a storage state of the second memory, and acquiring a new write-in table lookup base address according to the storage state of the second memory; generating a table lookup address to be updated through the address generator based on the new writing table lookup base address; and writing table configuration information corresponding to the instruction stream into a second memory based on the table lookup address to be updated so as to execute table lookup configuration updating operation.
Specifically, as shown in fig. 5, when the processor is started and then runs in an idle state, after the processor receives an instruction stream corresponding to the execution reconstruction flow, the processor updates the code of the operation processing instruction area in the instruction memory, and obtains a new table base address to be stored from the table base address information storage area, i.e. the LUT memory. And writing the received subsequent data stream into corresponding addresses in the LUT memory in turn, and marking the information of the newly stored operator type, base address and the like in the LUT configuration information area.
Further, after performing the table lookup configuration update operation, the following operations are further performed: judging whether the table lookup configuration updating operation in the instruction stream is completed or not; if the execution is finished, setting the processor to be in an idle state for running; otherwise, generating a table lookup address to be updated again through the address generator, and executing table lookup configuration updating operation.
The principle of the front-end data stream processing method is described in detail above, and an application scenario of the front-end data stream processing method will be described in detail below.
Fig. 6 shows a process of performing audio preprocessing on a data stream received by a front end, where the front end data stream processing method is set in a mobile device with a voice assistant function. The streaming processor in the application scenario is connected to a 12-bit wide analog-to-digital converter via an analog-to-digital interface (AD interface) for receiving the digital data stream of the mobile device. Then, in the stream processor, integer data is first converted into 8-bit floating point numbers through a built-in format conversion table, and FFT calculation is completed through a built-in floating point multiplication table and a floating point addition table, so that FFT calculation results are generated. And then, judging the threshold value of the energy in the voice frequency band in the FFT result, if the energy exceeds the threshold value, sending a wake-up interrupt to the CPU, waking up the dormant CPU, and transmitting the FFT conversion result to the CPU in real time in a data stream mode for voice recognition processing, so that the processing efficiency of the data stream is improved and the data stream processing power consumption of the whole mobile equipment is reduced through a stream data processor arranged at the front end.
Further, in the above-described streaming processor, the portion that can be reconstructed by expanding the instruction set includes, but is not limited to: the built-in program can be reconfigured by changing the floating point coding mode in a mode of reconstructing the built-in table, namely changing the range of floating point values corresponding to the AD result, changing the calculation precision and the like, changing the monitored audio frequency band, changing the output data format, changing the interrupt wakeup condition, changing the FFT calculation parameters and the like. In addition, the function realized by the mobile device can be changed by reconstructing the table look-up calculation mode. Illustratively, the FFT speech detection wake-up function is changed to the sound transformer function by changing the look-up table calculation mode by reconstructing the built-in table and updating the operation processing instruction area code.
Fig. 7 is a schematic structural diagram of a front-end data stream processing device according to an embodiment of the present invention. In this embodiment, the front-end data stream processing apparatus 700 includes:
data stream receiving module 701: for receiving the data stream and storing it in a buffer.
The table lookup configuration update module 702: for receiving an instruction stream for a table lookup configuration update operation and performing the table lookup configuration update operation.
The table look-up calculation module 703: the data stream is used for executing instruction fetching operation on the first memory to acquire instruction information based on the data stream and executing decoding operation on the instruction information; storing the decoding result of the instruction information after the decoding operation into an instruction register, and storing the data stream into a general register; performing decoding operation on the instruction information after decoding operation, and enabling an address generator to generate a corresponding table lookup address according to the decoding result and the data stream; and performing table lookup calculation operation in the second memory based on the table lookup address to obtain a corresponding table lookup calculation result.
It should be noted that: in the front-end data stream processing device provided in the above embodiment, only the division of the program modules is used for illustration, and in practical application, the processing allocation may be performed by different program modules according to needs, i.e. the internal structure of the device is divided into different program modules to complete all or part of the processing described above. In addition, the front-end data stream processing device and the front-end data stream processing method provided in the foregoing embodiments belong to the same concept, and detailed implementation processes of the front-end data stream processing device and the front-end data stream processing method are detailed in the method embodiments and are not repeated herein.
Referring to fig. 8, an optional hardware structure diagram of a front-end data stream processing terminal 800 according to an embodiment of the present invention may be shown, where the terminal 800 may be a mobile phone, a computer device, a tablet device, a personal digital processing device, a factory background processing device, etc. The front-end data stream processing terminal 800 includes: at least one processor 801, memory 802, at least one network interface 804, and a user interface 806. The various components in the device are coupled together by a bus system 805. It is appreciated that the bus system 805 is used to enable connected communications between these components. The bus system 805 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as bus systems in fig. 8.
The user interface 806 may include, among other things, a display, keyboard, mouse, trackball, click gun, keys, buttons, touch pad, or touch screen, etc.
It is to be appreciated that memory 802 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), a programmable Read Only Memory (PROM, programmable Read-Only Memory), which is used as an external cache area. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static Random Access Memory), synchronous static random access memory (SSRAM, synchronous Static Random Access Memory). The memory described by embodiments of the present invention is intended to comprise, without being limited to, these and any other suitable types of memory.
The memory 802 in the embodiment of the present invention is used to store various kinds of data to support the operation of the front-end data stream processing terminal 800. Examples of such data include: any executable programs for operating on the front end data stream processing terminal 800, such as an operating system 8021 and application programs 8022; the operating system 8021 contains various system programs, such as framework layers, core library layers, driver layers, etc., for implementing various basic services and handling hardware-based tasks. The application 8022 may contain various application programs, such as a Media Player (Media Player), a Browser (Browser), and the like, for implementing various application services. The front-end data stream processing method provided by the embodiment of the invention can be contained in the application 8022.
The method disclosed in the above embodiment of the present invention may be applied to the processor 801 or implemented by the processor 801. The processor 801 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware in the processor 801 or by instructions in software. The processor 801 may be a general purpose processor, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 801 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. The general purpose processor 801 may be a microprocessor or any conventional processor or the like. The steps of the accessory optimization method provided by the embodiment of the invention can be directly embodied as the execution completion of the hardware decoding processor or the execution completion of the hardware and software module combination execution in the decoding processor. The software modules may be located in a storage medium having memory and a processor reading information from the memory and performing the steps of the method in combination with hardware.
In an exemplary embodiment, the front-end data stream processing terminal 800 may be implemented by one or more application specific integrated circuits (ASIC, application Specific Integrated Circuit), DSPs, programmable logic devices (PLDs, programmable Logic Device), complex programmable logic devices (CPLDs, complex Programmable Logic Device) for performing the aforementioned front-end data stream processing methods.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by computer program related hardware. The aforementioned computer program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
In the embodiments provided herein, the computer-readable storage medium may include read-only memory, random-access memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, U-disk, removable hard disk, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. In addition, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable and data storage media do not include connections, carrier waves, signals, or other transitory media, but are intended to be directed to non-transitory, tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.
In summary, the present application provides a method, an apparatus, a terminal, and a medium for processing a front-end data stream, and the present invention provides a method for improving the processing efficiency of a front-end processor for streaming data. The method adopts the nonvolatile memory for storage, has larger storage density and lower operation power consumption, reduces the writing times of the memory, and relieves the limitation of the writing life of the nonvolatile memory. In addition, the instruction in the invention can freely update the operation instruction according to the need through the coding of the data stream, thereby improving the reconfigurability of the operation operator. And the calculation is performed in a table look-up mode, so that the operation cost is greatly reduced, the number of instructions required by the processor is reduced due to the simplicity and regularity of the customized instruction set in the microprocessor, and the design and area power consumption of the hardware circuit are further simplified. Therefore, the method effectively overcomes various defects in the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles of the present application and their effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those of ordinary skill in the art without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications and variations which may be accomplished by persons skilled in the art without departing from the spirit and technical spirit of the disclosure be covered by the claims of this application.

Claims (10)

1. A method of front-end data stream processing, characterized by a processor applied to a front-end, the method comprising:
receiving a data stream and an instruction stream for performing table lookup configuration updating operation, and storing the data stream and the instruction stream into a cache area respectively;
performing a fetching operation on a first memory based on the data stream to obtain instruction information, and performing a decoding operation on the instruction information;
storing the decoding result of the instruction information into an instruction register, and storing the data stream into a general register;
performing decoding operation on the decoding result to enable an address generator to generate a corresponding table lookup address according to the decoding result and the data stream;
and performing table lookup calculation operation in the second memory based on the table lookup address to obtain a corresponding table lookup calculation result.
2. The method of claim 1, wherein performing a look-up calculation operation in the second memory based on the look-up address to obtain a corresponding look-up calculation result comprises:
analyzing the first code in the data stream, and judging whether the table look-up calculation operation can be performed on the data stream according to the analysis result;
if the table lookup calculation operation can be performed, acquiring a corresponding table lookup base address based on the data stream; and generating the table lookup address through the address generator based on the table lookup base address, and performing the table lookup calculation operation in the second memory to obtain the table lookup calculation result.
3. The method for processing a front-end data stream according to claim 2, further comprising the following steps after obtaining the table look-up calculation result:
judging whether the table lookup calculation result is an intermediate result, if so, storing the table lookup calculation result into a cache area, otherwise, outputting the table lookup calculation result;
after the table look-up calculation result is stored in the buffer area, judging whether to continue to execute table look-up calculation operation on the data stream; if not, setting the processor to an idle state; if yes, based on the table lookup base address again, generating the table lookup address through the address generator and executing table lookup calculation operation.
4. The method of front-end data stream processing according to claim 1, wherein the method of performing a table look-up configuration update operation comprises:
updating an operation processing instruction area code in the first memory based on the instruction stream in the buffer area;
acquiring a storage state of the second memory, and acquiring a new write-in table lookup base address according to the storage state of the second memory;
generating a table lookup address to be updated through the address generator based on the new writing table lookup base address;
and writing table configuration information corresponding to the instruction stream into a second memory based on the table lookup address to be updated so as to execute table lookup configuration updating operation.
5. The method of front-end data stream processing according to claim 4, further comprising, after performing the table look-up configuration update operation, performing the following operations:
judging whether the table lookup configuration updating operation in the instruction stream is completed or not;
if the execution is finished, setting the processor to be in an idle state for running;
otherwise, generating a table lookup address to be updated again through the address generator, and executing table lookup configuration updating operation.
6. The method of claim 1, wherein the instruction information includes a look-up table instruction and a data transmission instruction, wherein,
The table look-up instruction includes: a table lookup address generation instruction, a table lookup calculation instruction and a table lookup configuration instruction;
the data transmission instruction includes: data transfer instructions between the register and the output, data transfer instructions between the register and the input, transfer instructions between the register and the buffer.
7. The front-end data stream processing method of claim 6, wherein the instruction information is controlled by an open instruction set processor, wherein the open instruction set processor comprises: an arithmetic logic unit, a shifter and a table look-up storage unit.
8. A front-end data stream processing apparatus, comprising:
a data stream receiving module: for receiving the data stream and storing it in a buffer;
and a table lookup configuration updating module: for receiving an instruction stream for a table lookup configuration update operation and performing the table lookup configuration update operation;
and the table look-up calculation module: the data stream is used for executing instruction fetching operation on the first memory to acquire instruction information based on the data stream and executing decoding operation on the instruction information; storing the decoding result of the instruction information after the decoding operation into an instruction register, and storing the data stream into a general register; performing decoding operation on the instruction information after decoding operation, and enabling an address generator to generate a corresponding table lookup address according to the decoding result and the data stream; and performing table lookup calculation operation in the second memory based on the table lookup address to obtain a corresponding table lookup calculation result.
9. A computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the front-end data stream processing method of any of claims 1 to 7.
10. An electronic terminal, comprising: a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program stored in the memory, so as to cause the terminal to execute the front-end data stream processing method according to any one of claims 1 to 7.
CN202311465598.7A 2023-11-06 2023-11-06 Front-end data stream processing method, device, terminal and storage medium Pending CN117331604A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311465598.7A CN117331604A (en) 2023-11-06 2023-11-06 Front-end data stream processing method, device, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311465598.7A CN117331604A (en) 2023-11-06 2023-11-06 Front-end data stream processing method, device, terminal and storage medium

Publications (1)

Publication Number Publication Date
CN117331604A true CN117331604A (en) 2024-01-02

Family

ID=89295531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311465598.7A Pending CN117331604A (en) 2023-11-06 2023-11-06 Front-end data stream processing method, device, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN117331604A (en)

Similar Documents

Publication Publication Date Title
JP5945291B2 (en) Parallel device for high speed and high compression LZ77 tokenization and Huffman encoding for deflate compression
EP1701250B1 (en) Ultra low power ASIP (Application-domain Specific Instruction-set Processor) microcomputer
US6865664B2 (en) Methods, systems, and computer program products for compressing a computer program based on a compression criterion and executing the compressed program
EP1313012A1 (en) Java DSP acceleration by byte-code optimization
US20120191767A1 (en) Circuit which Performs Split Precision, Signed/Unsigned, Fixed and Floating Point, Real and Complex Multiplication
US10224956B2 (en) Method and apparatus for hybrid compression processing for high levels of compression
CN101203830A (en) Register files for a digital signal processor operating in an interleaved multi-threaded environment
TWI493453B (en) Microprocessor, video decoding device, method and computer program product for enhanced precision sum-of-products calculation on a microprocessor
CN114721720B (en) Instruction set extension method and device, electronic equipment and storage medium
KR20210028075A (en) System to perform unary functions using range-specific coefficient sets
CN110806900B (en) Memory access instruction processing method and processor
US10409599B2 (en) Decoding information about a group of instructions including a size of the group of instructions
US8595470B2 (en) DSP performing instruction analyzed m-bit processing of data stored in memory with truncation / extension via data exchange unit
CN117331604A (en) Front-end data stream processing method, device, terminal and storage medium
CN115048334A (en) Programmable array processor control apparatus
Lee et al. Improving energy efficiency of coarse-grain reconfigurable arrays through modulo schedule compression/decompression
TW201706830A (en) Processing encoding format to interpret information regarding a group of instructions
CN110045989B (en) Dynamic switching type low-power-consumption processor
Thuresson et al. A flexible code compression scheme using partitioned look-up tables
US20230205530A1 (en) Graph Instruction Processing Method and Apparatus
WO2022174542A1 (en) Data processing method and apparatus, processor, and computing device
CN114267337B (en) Voice recognition system and method for realizing forward operation
US9411724B2 (en) Method and apparatus for a partial-address select-signal generator with address shift
US20210192353A1 (en) Processing unit, processor core, neural network training machine, and method
CN114968356A (en) Data processing method and device, processor and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination