WO2019242423A1 - 在tee侧实现多核并行的方法、装置及系统 - Google Patents

在tee侧实现多核并行的方法、装置及系统 Download PDF

Info

Publication number
WO2019242423A1
WO2019242423A1 PCT/CN2019/086133 CN2019086133W WO2019242423A1 WO 2019242423 A1 WO2019242423 A1 WO 2019242423A1 CN 2019086133 W CN2019086133 W CN 2019086133W WO 2019242423 A1 WO2019242423 A1 WO 2019242423A1
Authority
WO
WIPO (PCT)
Prior art keywords
thread
tee
shadow
core
child
Prior art date
Application number
PCT/CN2019/086133
Other languages
English (en)
French (fr)
Inventor
姚冬冬
李�雨
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to KR1020207037763A priority Critical patent/KR102509384B1/ko
Priority to EP19823478.3A priority patent/EP3812903A4/en
Priority to AU2019291421A priority patent/AU2019291421A1/en
Priority to CA3103584A priority patent/CA3103584C/en
Publication of WO2019242423A1 publication Critical patent/WO2019242423A1/zh
Priority to US17/126,873 priority patent/US11461146B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • G06F21/53Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by executing in a restricted environment, e.g. sandbox or secure virtual machine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/70Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
    • G06F21/71Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information
    • G06F21/74Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information operating in dual or compartmented mode, i.e. at least one secure mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/3009Thread control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4812Task transfer initiation or dispatching by interrupt, e.g. masked
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5015Service provider selection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation

Definitions

  • This application relates to operating system technology, and in particular, to a method, an apparatus, and a system for implementing multi-core parallelism in a multi-domain operating system.
  • ARM advanced RISC machines
  • SoC system on chips
  • REE rich execution environment
  • TEE trusted execution environment
  • REE and TEE run on the same physical device and run a separate operating system.
  • REE runs a client application (CA) with low security requirements; TEE runs a trusted application (TA) that needs to ensure its security, and provides a secure execution environment for an authorized trusted application TA.
  • CA client application
  • TA trusted application
  • the provided communication mechanism communicates just like the client and server.
  • biometric technology such as fingerprint recognition or face recognition, both of which can be used in scenarios such as terminal unlocking or payment.
  • biometrics has brought great convenience to end users.
  • biometric technology itself stores some biometric characteristics of the person, these are sensitive personal data, so the scheme applying this technology requires high terminal security when using it.
  • the architecture guarantees the security of the biometric scheme.
  • the main business logic of biometrics including feature extraction, feature comparison, live detection, and feature storage
  • the TA is run on the TEE
  • the biometric data is also saved on the TEE.
  • the security environment provided by TEE guarantees the security of the entire program.
  • TEE In the early days, TEE was designed to run on only one core (usually called core 0). In the case where the previous application scenario is relatively simple, this design can greatly simplify the system complexity and meet the needs of the time. However, the above application scenarios have high performance requirements, and the processing logic of the biometric technology is relatively complex, which places high requirements on the computing power of TEE. Therefore, the original single-core TEE implementation solution is difficult to meet the performance of these application scenarios Requested it. From a user experience perspective, such a single-core solution may lead to unacceptable face unlocking speed or face payment speed.
  • each CA can access the TEE driver by calling the TEE client library (TEE client library) on the REE side, and then the TEE driver sends a Secure Monitor Call (SMC) instruction; each core can enter the monitor Mode, and the state of each core is independent; after that, each core enters a safe mode, that is, TEE, and then TEE looks for a thread corresponding to the CA from a thread pool to complete the task in the TEE.
  • TEE client library TEE client library
  • SMC Secure Monitor Call
  • the number of cores in the TEE is directly limited by the number of threads that call the TA on the REE side.
  • TEE cannot actively create threads;
  • multiple TAs in parallel in this solution only Implemented through a simple thread pool, without providing a unified scheduling and load balancing mechanism, the parallel operation of multiple TAs will affect the performance and power consumption of the entire system;
  • This application provides a multi-core parallel method, device, computer system, etc., which can be run on an ARM-based Terminal equipment or other types of computer systems.
  • multiple business logic in services with high performance requirements can run in parallel on the TEE, and the TEE side can actively add cores, which improves the flexibility of TEE side parallelism.
  • the present application provides a computer system on which a rich execution environment REE and a trusted execution environment TEE are deployed, a CA is deployed on the REE side, and a TA is deployed on the TEE side.
  • the CA is configured to send a call request to the TA to call a function of the TA, and the function of the TA includes multiple sub-functions.
  • the TEE side also has a thread creation module, a notification module, and a TEE scheduling module, wherein the thread creation module is used to create a sub-thread under the call of the TA, and the sub-thread is used to implement the multiple sub-threads.
  • the notification module is configured to trigger the REE to generate a shadow thread, and the running of the shadow thread causes a core running the shadow thread to enter the TEE;
  • the TEE scheduling module is configured to transfer the Child threads are scheduled to run on the core.
  • TA is, for example, a TA that implements a face recognition function (abbreviated as face recognition TA), a TA that implements a fingerprint recognition function (abbreviated as fingerprint recognition TA), and the like.
  • the TA on the TEE side actively creates one or more child threads (usually multiple child threads), and each time a child thread is created, the REE side is triggered to generate a shadow thread by sending a notification.
  • the purpose of the shadow thread is to run the shadow.
  • the core of the thread is brought to the TEE side, and then the TEE scheduling module schedules the child threads created by the TA to run on this core.
  • the TA on the TEE side can create a sub-thread and actively "pull" the core to run the sub-thread according to demand.
  • One or more sub-threads run together with the TA main thread, thereby achieving multi-core parallelism on the TEE side.
  • the method of actively "pulling" the core is more flexible and more effective than the prior art.
  • a notification processing module is further deployed on the REE side, and the notification module is specifically configured to generate a notification after the child thread is created, and send the notification to the notification processing module; And the notification processing module is configured to create a shadow thread according to the notification, and the shadow thread, when being executed, causes a core running the shadow thread to enter the TEE side.
  • the notification is, for example, a soft interrupt.
  • the shadow thread After the shadow thread is created and run, it will cause the core running the shadow thread to enter the TEE side, this time the shadow thread enters the TEE side "for the first time". After a period of time, the shadow thread may retreat to the REE side or may enter the TEE side again.
  • a shadow thread enters the REE or TEE side
  • the core running the shadow thread enters the REE or TEE
  • the core running the shadow thread runs in the REE or TEE environment or the core runs in the REE or TEE TEE mode.
  • the TEE scheduling module is further configured to record a correspondence between the shadow thread and the child thread. Specifically, the TEE scheduling module creates a first thread identifier for the sub-thread, and the first thread identifier is used to indicate a thread accessing the sub-thread; and the sub-thread is scheduled to run on the core. The value of the first thread identifier is then set as the identifier of the shadow thread.
  • the "shadow thread” can be considered as a virtual CA on the REE side.
  • the virtual CA accesses the child threads on the TEE side and establishes the client / server relationship between the shadow thread and the child thread by recording the shadow thread's identity.
  • the corresponding relationship between the shadow thread and the child thread is recorded, so that whether or not the shadow thread is scheduled to another core by the REE side scheduler, it can ensure that when the shadow thread enters the TEE side again, the corresponding child thread can also be Scheduled to execute on the core running the shadow thread.
  • the first thread identifier is included in a thread control block (TCB) corresponding to the child thread, and is a field in the TCB.
  • TBC thread control block
  • the TEE scheduling module is specifically configured to: when it is determined that the shadow thread enters the TEE for the first time, schedule the newly created sub-thread to run on a core running the shadow thread.
  • the shadow thread is created by triggering the child thread, so when it is determined that the shadow thread enters the TEE for the first time, the child thread is scheduled to run on the core running the shadow thread.
  • the TEE scheduling module is further configured to: when it is determined that the shadow thread enters the TEE again, according to the recorded correspondence between the shadow thread and the child thread, the child thread is Scheduled to run on the current core running the shadow thread.
  • the "current core” running the shadow thread here may be the original core or another core.
  • the shadow thread causes the core running the shadow thread to enter the TEE side by calling the security monitoring instruction SMC instruction, which may be the first entry or the second entry.
  • SMC instruction includes a parameter, which is used to indicate whether the core enters the TEE for the first time or enters the TEE again.
  • the TEE scheduling module is configured to determine, according to the parameters, that the shadow thread enters the TEE again.
  • the TEE scheduling module is further configured to record a correspondence between a current core running the shadow thread and the shadow thread.
  • the "current core” running the shadow thread here may be the original core or another core.
  • the TEE scheduling module is specifically configured to record an identifier of the shadow thread at a corresponding element in the global state array of the current core after the current core running the shadow thread enters the TEE.
  • the global state array includes N elements, each element corresponding to a core of the computer system; after the current core running the shadow thread leaves the TEE, the current core is placed in the global state The corresponding element in the array is cleared.
  • data preparation can be provided for scheduling, that is, which core and which shadow thread are currently known, so as to find the corresponding child thread according to the identity of the shadow thread, and Schedule child threads to run on this core.
  • the TEE scheduling module is specifically configured to record the shadow thread at a corresponding element in the global state array of the current core after the current core running the shadow thread enters the TEE. And find the target sub-thread, and schedule the target sub-thread to run on the current core, where the first thread corresponding to the target sub-thread is the current core corresponding to the global state array The ID recorded at the element's location.
  • the return of the shadow thread to the REE side may be triggered by an interrupt.
  • the TEE scheduling module determines that the shadow thread enters the TEE for the first time, and then dispatches a child thread that has not yet run (also can be understood as a child thread that has not yet established a corresponding relationship with any shadow thread) to Run on the core running the shadow thread.
  • the child thread can be indicated by the running state of the thread. For example, the newly created child thread is set to a specific running state, so that when a core is first pulled into the TEE side, the child thread can be identified and run the child thread. Thread.
  • the TEE scheduling module may identify the newly created (not yet run) child thread by the first thread identifier being empty.
  • the TEE scheduling module determines that the shadow thread is not entering the TEE for the first time, it determines a target child thread, and schedules the target child thread to run on the current core running the shadow thread, where the target The first thread identifier of the child thread is the identifier of the shadow thread.
  • the TEE further deploys a neural network processing unit NPU driver.
  • the NPU driver is used to drive the NPU to run under the call of one or more child threads of the TA.
  • NPU is a specialized neural network processor, which is used to implement large-scale complex parallel operations, especially neural network related operations.
  • the algorithm can be implemented by software, or NPU acceleration can be called like the method proposed in this application.
  • the TEE further deploys a secure storage unit and a hardware drive unit, and the secure storage unit and the hardware drive unit can only be accessed by the TEE; the hardware drive unit is used for one of the TAs.
  • the corresponding hardware is accessed under the call of one or more child threads; the secure storage unit is configured to store data collected by the hardware.
  • the secure storage unit here is understood as a storage area. Since it can only be accessed by TEE, it is secure.
  • the secure storage unit is a cache, either a fixed-size cache or a non-fixed-size cache, and the non-fixed-size cache may also be simply referred to as a dynamic cache.
  • the hardware driving unit is a camera driver
  • the corresponding hardware is a camera.
  • the TA directly accesses the hardware on the TEE side, and stores the data collected by the hardware in the storage area on the TEE side, which further ensures the TA's process of using the data and the security of the data itself.
  • the method provided in this application can be used to deploy the camera driver on the TEE side and store the face image collected by the camera on the TEE side.
  • the TA can be driven directly on the TEE side
  • the camera accesses the face image, which further guarantees the security of the entire face recognition process.
  • the modules in the first aspect provided in this application are merely examples, and should not be considered as limiting the scope of the application. All the methods executed by the modules deployed on the TEE side can also be regarded as the methods executed by the TEE. Accordingly, all the methods executed by the modules deployed on the REE side can also be regarded as the methods executed by the REE.
  • the methods performed by TEE and REE in this application, except for some steps performed by hardware, can generally be considered as methods performed by operating systems or applications of TEE and REE.
  • the present application provides a method for implementing multi-core parallelism on a TEE side of a trusted execution environment.
  • the method runs on a multi-core computer device.
  • the method includes: TEE creates a sub-thread, the sub-thread is used to implement a sub-function of the TA deployed on the TEE side; the TEE triggers a rich execution environment REE to generate a shadow thread, and the running of the shadow thread causes the shadow thread to run
  • the TEE enters the TEE; the TEE dispatches the created sub-thread to the core for execution.
  • the TEE generates a notification (such as a soft interrupt) after the child thread is created, and sends the notification to the REE, so that the REE creates the shadow thread according to the notification .
  • a notification such as a soft interrupt
  • the method further includes: the TEE records a correspondence between the shadow thread and the child thread.
  • the TEE recording a corresponding relationship between the shadow thread and the child thread includes: the TEE records an identification of the shadow thread in a thread control block TCB of the child thread.
  • the method further includes: after the running of the shadow thread prompts the current core running the shadow thread to enter the TEE (it can also be understood that the shadow thread enters the TEE again), The TEE schedules the child thread to run on the current core running the shadow thread according to the recorded correspondence between the shadow thread and the child thread.
  • the "current core” here may be the previous core or another core, because the shadow thread may be scheduled to run on a different core.
  • the method further includes: the TEE records a correspondence between a current core running the shadow thread and the shadow thread. Specifically, after the current core running the shadow thread enters the TEE, the identification of the shadow thread is recorded at a corresponding element in the global state array of the current core, where the global state array contains N Element, each element corresponding to a core of the computer system; after the current core running the shadow thread leaves the TEE, the current core is emptied at the corresponding element in the global state array.
  • the method further includes: the TEE implements a call to the NPU by calling a NPU driver of a neural network processing unit deployed on the TEE.
  • the method further includes: the TEE accesses corresponding hardware through a hardware driver unit deployed on the TEE side, and stores data collected by the hardware in a secure storage unit deployed on the TEE side.
  • the TA is a TA that implements a face recognition function or a TA that implements a fingerprint recognition function, or a TA that simultaneously implements face recognition and fingerprint recognition functions.
  • Face recognition can be specifically 3D face recognition.
  • the present application provides a computer system.
  • the computer system includes a memory and a processor.
  • the memory is configured to store computer-readable instructions (or computer programs), and the processor is configured to read the computer program.
  • Computer readable instructions to implement the methods provided by any of the foregoing implementations.
  • the present application provides a computer storage medium, which may be non-volatile.
  • the computer storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the method provided in any foregoing implementation manner is implemented.
  • the present application provides a computer program product that includes computer-readable instructions, and when the computer-readable instructions are executed by a processor, the method provided in any of the foregoing implementation manners is implemented.
  • the multi-core parallel method, device, and computer system provided on the TEE side can implement parallel execution of multiple tasks on the TEE side, for example, parallel execution of multiple subtasks of a TA.
  • some complex services with high security requirements such as 3D face recognition, can all be placed on the TEE side and can be executed in parallel at the same time, which can meet the security requirements and performance of this type of business. demand.
  • the way in which the TEE side triggers the REE side to generate a shadow thread is to implement an active "pull" core into the TEE side, which improves the flexibility of the TEE side parallelism.
  • the CA-TA scheduling group by recording the access correspondence between the REE-side CA and the TEE-side TA (that is, the CA-TA scheduling group), the CA (including shadow threads) and the corresponding TA (including TA Sub-threads) can always run on the same core, thereby ensuring the calculation accuracy of the CA load on the REE side, and providing a good foundation for the overall load balancing of the system.
  • the acceleration capability of the NPU is superimposed on the basis of parallel operation, which further improves the efficiency of business execution.
  • the security of the data can be further ensured, thereby ensuring the security of the service.
  • FIG. 1 is a schematic diagram of a TEE side multi-core solution provided by the prior art
  • FIG. 2 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a TEE side multi-core parallel scheme deployment provided by an embodiment of the present application.
  • 4a and 4b are schematic diagrams of a method for a TEE-side multi-core parallel solution according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of multiple CA-TA scheduling groups formed by this application.
  • FIG. 6 is a schematic diagram of a terminal system for implementing face / fingerprint dual authentication provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a payment scheme according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a computer system according to an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a neural network processing unit according to an embodiment of the present application.
  • Multi-core scheduling A scheduling system provided by a computer system with a multi-core processor that supports the creation, scheduling, migration, and destruction of tasks on multiple cores.
  • Load balancing Multiple tasks in parallel on a multi-core processor need to balance system load by balancing the distribution of tasks on different cores to achieve the overall system performance and power consumption goals.
  • Task In this application, it refers to a general concept. Everything that a computer needs to implement can be called a task, such as a process, a thread, a sub-thread, a CA, a TA, or a service.
  • Thread Sometimes called light weight process (LWP), it is the smallest unit of program execution flow.
  • the entities of a thread include programs, data, and TCB.
  • Thread is a dynamic concept, and its dynamic characteristics are described by a thread control block (TCB).
  • the TCB can include the following information: thread status, saved field resources when the thread is not running, a set of execution stacks, local variable main storage area for each thread, access to main storage and other resources in the same process, and so on. This embodiment changes the TCB.
  • Interrupt request generally refers to an event generated by hardware or software.
  • the hardware sends the event to the processor.
  • the processor receives the event, it temporarily stops the execution of the current program and instead Execute the program corresponding to the event.
  • Interrupt requests include soft interrupts and hard interrupts.
  • Interrupts to the processor generated by hardware are usually called hard interrupts, hardware interrupts (sometimes also referred to as interrupts), and soft interrupts
  • the process currently running on the processor is spawned. The processing of a soft interrupt simulates the processing of a hard interrupt.
  • a soft interrupt When a soft interrupt occurs, you first need to set the corresponding interrupt flag bit, trigger the interrupt transaction, and then wake up the daemon thread to detect the interrupt status register. If an interrupt occurs, then the corresponding soft interrupt service routine is called by querying the soft interrupt vector table. This is the process of soft interrupt processing. The difference between a soft interrupt and a hard interrupt is the mapping process from the interrupt flag to the interrupt service routine. After a hard interrupt occurs, the processor needs to map the hard interrupt request into a specific service program through the vector table. This process is automatically completed by the hardware, but the soft interrupt is not. It requires a daemon thread to implement this process, which is software. Analog interrupts are called soft interrupts.
  • CFS completely fair scheduler
  • FIG. 2 is a schematic structural diagram of a terminal device according to this embodiment.
  • the terminal device can be a desktop computer, a notebook, a mobile phone, a tablet computer, a smart watch, a smart bracelet, and the like.
  • This terminal is deployed with System, the system contains REE and TEE, REE and TEE run on Operating system and a TEE operating system (e.g. open source operating system).
  • the operating system and TEE OS are divided into user mode and kernel mode.
  • Multiple CAs are deployed in the user mode on the REE side, such as face recognition CA and fingerprint recognition CA.
  • Multiple trusted applications are deployed in the user mode on the TEE side, such as fingerprint recognition TA and face recognition TA.
  • the kernel mode on the REE side is deployed Components, the kernel mode on the TEE side deploys trusted core components.
  • the CA in REE and the TA in TEE constitute a client / server-like architecture. CA acts as the client, TA acts as the server, and the CA initiates access operations. The two communicate through the REE communication agent, the message channel at the hardware layer, and TEE The agent interacts with the data, and the three establish a secure communication channel for the CA and TA, which ensures the security of data transmission to a certain extent.
  • the CA calls a TEE client application API (application program interface) to communicate with the corresponding TA; the TA calls a TEE internal API (internal API) to use the programming resources provided by the TEE to implement related functions.
  • TEE client application API application program interface
  • TEE internal API internal API
  • FIG. 3 is a schematic deployment diagram of a TEE-side multi-core parallel solution provided in this embodiment.
  • This embodiment takes the face recognition CA and the face recognition TA as examples to introduce the technical solution.
  • Face recognition CA and face recognition TA301 are deployed on the REE side and the TEE side, respectively. They cooperate to complete face recognition services such as face verification, and are widely used in terminal unlocking, application login, and financial payment scenarios.
  • the face recognition TA301 in this embodiment may further include four sub-functions of feature extraction, feature comparison, living body detection, and feature storage. In other embodiments, the face recognition TA may include more, fewer, or other types of sub-functions, which is not limited in this application.
  • a TEE scheduling module 305, a thread creation module (such as libthread) 302, a notification module 303, and a TEE scheduling module 305 are deployed on the TEE side; and a notification processing module 304 is deployed on the REE side.
  • the monitor is The existing module provided by the system is used to switch from REE to TEE.
  • the thread creation module 302 is used to create a sub-thread under the call of face recognition TA301, and calls the notification module 303 to generate a soft interrupt; the notification module 303 is used to generate a soft terminal and send the soft interrupt to the notification processing module on the REE side to notify
  • the processing module 304 is configured to receive the soft interrupt and create a shadow thread, and the created shadow thread is scheduled to run on a core. After that, the shadow thread enters the TEE side by sending an SMC instruction, which is equivalent to the core running the shadow thread entering the TEE side (that is, the safe mode).
  • secure hardware and non-secure hardware are also deployed in the hardware layer, where secure hardware refers to hardware that can only be accessed by TEE, and non-secure hardware refers to hardware that can be accessed by both REE and TEE. Or hardware that can only be accessed by REE.
  • FIG. 4a and FIG. 4b are schematic diagrams of a method of a TEE-side multi-core parallel scheme provided in this embodiment. The implementation process of the solution will be described in detail based on Figs. 3 and 4a-4b.
  • the face recognition CA passes The provided SMC instruction sends a call request to the face recognition TA301 on the TEE side.
  • This process is prior art, which is not described in detail in this application. In order to facilitate the reader's understanding, the process can be understood as follows: The provided SMC instruction, the core running the face recognition CA enters the TEE side (safe mode), and starts running the face recognition TA in the safe mode to realize the function of the face recognition TA.
  • the face recognition TA301 After the face recognition TA301 receives the call request, it creates a child thread T1. Specifically, the face recognition TA301 creates the sub-thread T1 through the pthread_create interface in the thread creation module 302 (for example, libthread).
  • the face recognition TA will eventually create 4 child threads T1-T4.
  • the four sub-threads respectively handle the four sub-functions of feature extraction, feature comparison, live detection, and feature storage.
  • the creation and running of one sub-thread is taken as an example for description. For the creation and running process of the remaining three sub-threads, reference may be made to the creation and running of this sub-thread.
  • the thread creation module 302 calls the notification module 303 after creating the child thread T1 to generate a soft interrupt, and the soft interrupt is sent to the notification processing module 304 on the REE side.
  • the TEE scheduling module 305 creates a corresponding task control data structure for this child thread T1, that is, a thread control block (TCB)
  • the structure of the TCB is as follows:
  • task refers to child threads
  • the TCB of each child thread includes the running status, scheduling policy, TCB name, and so on.
  • the English identifier in front of each field indicates the type of value of the field.
  • the TCB provided in this embodiment includes a ca field, which is an implementation of the "first thread identifier" proposed in this application.
  • the value of the ca field can default to 0.
  • the notification processing module 304 generates a thread S1 after receiving the soft interrupt.
  • PID process identification
  • This thread is referred to as a shadow thread in the following of this embodiment, and its essence is the same as that of an ordinary thread, except that the functions implemented are special in this embodiment.
  • the face recognition TA301 only one CA accessing it is the face recognition TA, but in this embodiment, the face recognition TA301 is not completed by one thread, but is jointly performed by multiple threads, so the shadow thread can understand "Virtual CA" for accessing child threads.
  • core in this application refers to the smallest physical processing unit.
  • the SMC instruction sent by the shadow thread may include a parameter, which is used to indicate that the shadow thread enters the TEE side for the first time.
  • this parameter can be firstIn.
  • firstIn true, it indicates that the shadow thread enters the TEE side for the first time.
  • firstIn false, it indicates that the shadow thread does not enter the TEE side for the first time.
  • the SMC instruction is sent, it contains a parameter, but it does not include this parameter. In this way, the receiver can determine whether the shadow thread enters the TEE side for the first time by judging the presence or absence of this parameter.
  • the TEE scheduling module 305 After the shadow thread S1 enters the TEE side, that is, after the core running the shadow thread S1 enters the TEE side, the TEE scheduling module 305 records the PID of the shadow thread S1 in the position of the core in the global state array.
  • CPU refers to the "core” mentioned earlier.
  • core 1 the core running the shadow thread S1
  • the TEE scheduling module 305 assigns the value of the ca field of ctx_map_t [1] (that is, S1) to the ca field of the TCB corresponding to the child thread T1. In this way, the shadow thread S1 and the child thread T1 are set as a CA-TA group of CA and TA, respectively.
  • the above steps S101-S107 are the processes for the first time the shadow thread is created and entered into the TEE side for the first time. Repeat the above steps S102-S103, S103a, and S104-S107 to create another three child threads and corresponding other three shadow threads, and form another three CA-TA groups. In this way, on the TEE side, multiple cores are running at the same time and perform four sub-functions of face recognition TA301 at the same time, which greatly improves the execution efficiency of face recognition TA.
  • the TEE actively "pulls" the core into the TEE side, enabling the TEE side to implement the active execution of child threads even as a passive operating system, thereby improving the flexibility of multi-core parallelism on the TEE side.
  • shadow thread S1 may be interrupted and returned to the REE side during the running process. At the REE side, it may be scheduled to other cores. At this time, to ensure that child thread T1 is still in the same core as shadow thread S1. To run, refer to Figure 4b, you need to perform the following operations.
  • the TEE scheduling module 305 clears the ca field of ctx_map_t [1].
  • the TEE scheduling module 305 sets the ca field of the corresponding position in the global state array to S1.
  • the TEE scheduling module 305 still sets the ca field of ctx_map_t [1] to S1; if the shadow thread S1 is on the REE side by the REE-side scheduling module (for example, CFS scheduler) to run on another core, such as core 2, then the TEE scheduling module 305 sets the ca field of ctx_map_t [2] to S1,
  • the TEE scheduling module 305 searches for a target sub-thread, and schedules the target sub-thread to run on the current core.
  • the target child thread needs to satisfy the following conditions: the ca field in its TCB is the same as the ca field corresponding to the current core in the global state array, in this embodiment, both are S1. It can be seen that, in this embodiment, the target child thread is the child thread T1, so the child thread T1 is scheduled to run on the current core.
  • the "current core” may be core 1 or core 2 according to the description of step S109.
  • the target child thread can only be scheduled to execute on the core in the executable state in this embodiment. If it is in a certain non-executable state, the TEE scheduling module 305 can let core 1 or core 2 wait according to the scheduling policy. Or execute other executable processes, which are not limited in this application.
  • FIG. 5 shows a plurality of CA-TA scheduling groups formed after the method provided in the present application is implemented. It can be seen from the figure that the main face recognition TA thread and the face recognition CA form a scheduling group, and the other 4 child threads form 4 scheduling groups with the shadow threads S1-S4 respectively. These 5 scheduling groups will be used with other applications Participate in the load balancing scheduling process of the CFS scheduler together.
  • this application introduces another scenario that requires dual authentication of face recognition and fingerprint recognition.
  • the multi-core parallel scheme provided by this application can still provide unified scheduling of CA and TA.
  • FIG. 6 is a schematic diagram of a terminal system for implementing face / fingerprint dual authentication provided by this embodiment.
  • the two-factor authentication implementation scheme is described below.
  • the face recognition CA608 and fingerprint recognition CA607 on the REE side respectively make requests to the TEE side.
  • the way to initiate a request is to call the monitor into the monitoring mode through the TrustZone driver, and then enter the TEE mode from the monitoring mode.
  • the TA manager 609 determines, according to the information carried in the request, that the face recognition TA601 and the fingerprint recognition TA604 respectively process the request of the face recognition CA and the request of the fingerprint recognition CA.
  • the TEE scheduling module 610 records the PIDs of the face recognition CA and the fingerprint recognition CA in the positions corresponding to the two cores in the global state array, and in the ca field of the TCB of the face recognition TA601 and the ca field of the TCB of the fingerprint recognition TA604.
  • the face recognition CA608 and fingerprint recognition CA607 PIDs are recorded separately. In this way, two pairs of CA-TA scheduling groups are established, and the load generated by the TA on the TEE side can be bundled with the corresponding CA as a load calculation unit.
  • Face recognition TA601 calls the rights management service 602 by sending a message, and the rights management service 602 calls the camera driver 603; similarly, the fingerprint recognition TA604 calls the rights management service 605, and the rights management service 605 calls the fingerprint driver.
  • the rights management service 602 and the rights management service 605 refer to the same service. In other embodiments, the two services may also be two independent services.
  • the above "call” is essentially inter-process communication (IPC).
  • the IPC mechanism in the TEE is accomplished through messages.
  • the value of the ca field in the TCB of the message initiator is passed to the message receiver when the message is passed. Therefore, all service processes on the call chain of the TA are correspondingly Pull in the corresponding CA-TA scheduling group. As shown in FIG. 5, this embodiment forms two scheduling groups.
  • the service process When a service process receives a message from another TA after processing a message from a TA, the service process will update the CA value with the new message and be taken to another CA-TA group. As shown in the figure, the rights management service 602 may switch from a face recognition CA-TA scheduling group to a fingerprint recognition CA-TA scheduling group.
  • the face recognition TA601 sends a message to the rights management service 602, and passes the value of the ca field in the TCB of the face recognition TA601, that is, the PID of the face recognition CA, to the rights management service 602 and the rights management service 602
  • the value of the ca field in the TCB is also set to the PID of the face recognition CA601.
  • the rights management service 602 is called by the fingerprint identification TA604 again, and the value of the ca field in the TCB of the rights management service 602 (equivalent to the rights management service 605 in the figure) is reset to the PID of the fingerprint identification CA.
  • a CA-TA scheduling group is used as a scheduling unit, and unified scheduling is performed by the CFS scheduler on the REE side.
  • the scheduling may be triggered by load balancing requirements.
  • the CA is scheduled to another core by the CFS scheduler, then the TA of the same scheduling group and other processes called by the TA will also be scheduled to this core by the TEE scheduling module 610. Therefore, by the method provided in the present application, when the multi-TA is parallel, unified scheduling of the CA and the corresponding TA can be achieved, thereby ensuring the calculation accuracy of the CA load. For example, if face recognition CA608 is scheduled to another core and face recognition TA601 is not scheduled to this core, the load of other threads running on this core will be calculated to the load of face recognition CA608 Medium, and this is incorrect.
  • a situation similar to the aforementioned S108 may also exist in this scenario.
  • the core performing face recognition TA601 will be interrupted and returned to the REE side to respond to the interrupt request.
  • the TEE scheduling module 610 When exiting, the TEE scheduling module 610 will delete the The corresponding ca field of the core is cleared. If face recognition CA608 is scheduled to run on a new core by the CFS scheduler on the REE side, and then enters the TEE side, the TEE scheduling module 610 will set the ca field corresponding to the new core in the global state array to face recognition CA608. PID.
  • the TEE scheduling module 610 finds a target task according to the PID of the face recognition CA608, and the ca value in the TCB field of the target task is also the PID of the face recognition CA608.
  • the target task includes one or more of face recognition TA601, camera driver 603 (process), and rights management service 602 (process).
  • the rights management service may not belong to this target task, because its ca field may be modified due to the call of fingerprint identification TA604.
  • the TEE scheduling module 610 schedules the target task to run on the new core, thereby completing the migration of the TA and TA-called services between the cores, realizing the unified migration of the CA-TA scheduling group, and always ensuring the CA-TA scheduling group The tasks contained in it run on the same core.
  • FIG. 7 is a schematic diagram of a payment scheme provided by this embodiment.
  • this payment solution further utilizes a neural network processing unit and other methods to improve security and performance.
  • the payment solution contains multiple business logics: payment application 701 and face recognition CA702, face recognition TA708, and camera service 703 triggered by the payment application.
  • face recognition TA708 contains 4 sub-business logics: feature extraction, live detection , Feature comparison, and feature storage.
  • the hardware involved in this solution includes a camera 706, a neural processing unit (NPU) 715, a memory 714, and a central processing unit (not shown).
  • the TEE side has a camera 706, NPU715, and memory 714 drive.
  • the camera service 703, face recognition CA702, and NPU service CA704 on the REE side are only responsible for business initiation and some non-critical business logic processing.
  • a driver for the camera 706 is deployed on the TEE side in this implementation, but a driver for the camera 706 may also be deployed on the REE side, so that applications or services on the REE side can access the camera 706 through the driver.
  • the face recognition CA702 of the REE calls the face recognition TA708 on the TEE side, thereby initiating a face recognition process.
  • the face recognition TA708 accesses the camera 706 through the camera driver 705 on the TEE side.
  • the face recognition TA708 can access the camera 706 by driving an ISP through an image signal processor (ISP).
  • ISP image signal processor
  • the image security buffer 707 can be understood as a software located on the TEE side, and also can be understood as a storage space (such as a memory) that can only be accessed by the TEE. Face recognition TA708 will access the image security buffer 707 according to the address, and perform feature extraction, live detection, feature comparison, feature storage and other algorithms on the collected images based on pre-stored face templates and other information.
  • a camera driver is usually deployed only on the REE side, and some functions of face recognition TA, such as feature extraction, are placed on the REE side.
  • the feature extraction function calls the camera driver on the REE side and implements image acquisition, but provided by this embodiment.
  • the face recognition TA708 can directly access the camera 706 through the camera driver 705 deployed on the TEE side, and cache the image in the image security cache 707 on the TEE side, thereby ensuring that the camera usage and data storage are on the TEE side Completed to further ensure the security of the data.
  • the face recognition TA708 will access the NPU driver 712 through the NPU service TA709 on the TEE side, and then call the NPU 715 through the NPU driver 712 to improve the processing speed.
  • the payment application 701 will obtain the final result of face recognition through its payment application TA710.
  • the Alipay application obtains the final result of face recognition through the International Finance Authentication Alliance (ifaliance, ifaa) TA.
  • the face template is pre-registered into the terminal device.
  • the face image collected during payment needs to match the face template to complete the payment application, so the security of the face template is very important.
  • the face template is stored into the memory 714 through the storage service 713 on the TEE side.
  • the memory 714 may be a memory with certain security features, such as a rollback protection storage block (replay protected memory (RPMB), the memory can be set to be accessible only by the TEE side service, further improving the security of the memory, thereby ensuring the security of the face template, and thus the security of the face recognition process.
  • RPMB rollback protection storage block
  • the face recognition solution implemented by the method provided by the present application can satisfy both security and high performance requirements.
  • part of the key business logic included in the face recognition process is implemented on the REE side (for example, live detection is implemented on the REE side).
  • the solution provided in this application implements the face recognition process.
  • the key business logic contained in it is all implemented on the TEE side, and the efficiency of the face recognition process is improved in a multi-core parallel manner to meet the performance requirements.
  • data (such as images, etc.) generated or used in the face recognition process will be stored on the TEE side, thereby using the TEE security guarantee mechanism to further improve the security of face recognition.
  • FIG. 8 is a schematic structural diagram of a computer system according to this embodiment.
  • the computer system may be a terminal device.
  • the computer system includes a communication module 810, a sensor 820, a user input module 830, an output module 840, a processor 850, an audio and video input module 860, a memory 870, and a power source 880.
  • the computer system provided in this embodiment may further include an NPU890.
  • the communication module 810 may include at least one module that enables communication between the computer system and a communication system or other computer systems.
  • the communication module 810 may include one or more of a wired network interface, a broadcast receiving module, a mobile communication module, a wireless Internet module, a local area communication module, and a location (or positioning) information module.
  • the sensor 820 may sense a current state of the system, such as an open / closed state, position, whether there is contact with the user, direction, and acceleration / deceleration, and the sensor 820 may generate a sensing signal for controlling the operation of the system.
  • a current state of the system such as an open / closed state, position, whether there is contact with the user, direction, and acceleration / deceleration
  • the user input module 830 is configured to receive inputted digital information, character information, or contact touch operations / contactless gestures, and receive signal inputs related to user settings and function control of the system.
  • the user input module 830 includes a touch panel and / or other input devices.
  • the output module 840 includes a display panel for displaying information input by the user, information provided to the user, or various menu interfaces of the system.
  • the display panel may be configured in the form of a liquid crystal display (LCD) or an organic light-emitting diode (OLED).
  • the touch panel may cover the display panel to form a touch display screen.
  • the output module 840 may further include an audio output module, an alarm, a haptic module, and the like.
  • the audio and video input module 860 is configured to input an audio signal or a video signal.
  • the audio and video input module 860 may include a camera and a microphone.
  • the power source 880 may receive external power and internal power under the control of the processor 850 and provide power required for operation of various components of the system.
  • the processor 850 includes one or more processors.
  • the processor 850 may include a central processing unit and a graphics processor.
  • the central processing unit has multiple cores in this application, and belongs to a multi-core processor. These multiple cores can be integrated on the same chip, or they can be independent chips.
  • the memory 870 stores a computer program including an operating system program 872, an application program 871, and the like.
  • Typical operating systems such as Microsoft's Windows, Apple's MacOS and other systems used for desktops or laptops, and based on Google's development based on Android
  • a system such as a system for a mobile terminal.
  • the method provided by the foregoing embodiment may be implemented by software, and may be considered as a specific implementation of the operating system program 872.
  • the memory 870 may be one or more of the following types: flash memory, hard disk type memory, micro multimedia card memory, card memory (such as SD or XD memory), random access memory (random access memory) (RAM), static random access memory (static RAM, SRAM), read-only memory (read only memory, ROM), electrically erasable programmable read-only memory (electrically erasable programmable read-only memory (EEPROM), programmable Read-only memory (programmable ROM, PROM), rollback protected memory block (replay protected memory block (RPMB)), magnetic memory, magnetic disk or optical disk.
  • the memory 870 may also be a network storage device on the Internet, and the system may perform operations such as updating or reading on the memory 870 on the Internet.
  • the processor 850 is used to read a computer program in the memory 870 and then execute a method defined by the computer program, for example, the processor 850 reads an operating system program 872 to run the operating system and implement various functions of the operating system in the system, or read Take one or more application programs 871 to run applications on the system.
  • the memory 870 also stores other data 873 in addition to the computer program.
  • the NPU 890 is mounted on the main processor 850 as a coprocessor, and is used to perform tasks assigned to it by the main processor 850.
  • the NPU 890 may be called by one or more sub-threads of the face recognition TA to implement part of the complex algorithms involved in face recognition. Specifically, the sub-threads of the face recognition TA run on multiple cores of the main processor 850, and then the main processor 850 calls the NPU890, and the results realized by the NPU890 are returned to the main processor 850.
  • connection relationship between the above modules is only an example, and the method provided by any embodiment of the present application can also be applied to terminal devices in other connection modes, for example, all modules are connected through a bus.
  • FIG. 9 is a schematic structural diagram of an NPU 900 provided in this embodiment.
  • the NPU900 is connected to the main processor and external memory.
  • a core part of the NPU 900 is an arithmetic circuit 903.
  • the controller 904 controls the arithmetic circuit 903 to extract data in a memory and perform mathematical operations.
  • the operation circuit 903 includes a plurality of processing engines (PEs). In some implementations, the operation circuit 903 is a two-dimensional pulsating array. The operation circuit 903 may also be a one-dimensional pulsation array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In other implementations, the arithmetic circuit 903 is a general-purpose matrix processor.
  • PEs processing engines
  • the operation circuit 903 is a two-dimensional pulsating array.
  • the operation circuit 903 may also be a one-dimensional pulsation array or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
  • the arithmetic circuit 903 is a general-purpose matrix processor.
  • the operation circuit 903 takes the data corresponding to the matrix B from the weight memory 902 and buffers the data on each PE of the operation circuit 903.
  • the arithmetic circuit 903 takes matrix A data from the input memory 901 and performs matrix operations on the matrix B, and the partial or final result of the obtained matrix is stored in an accumulator 908.
  • the unified memory 906 is used to store input data and output data.
  • the weight data is directly transferred to the weight memory 902 through the storage unit access controller 905 (for example, direct memory access controller (DMAC)).
  • the input data is also transferred to the unified memory 906 through the storage unit access controller 905.
  • DMAC direct memory access controller
  • a bus interface unit 910 (bus interface unit, BIU) is used for interaction between an AXI (advanced advanced interface) bus and a storage unit access controller 905 and an instruction fetch memory 909.
  • BIU bus interface unit
  • the bus interface unit 910 is used to fetch the instruction from the external memory 909, and is also used for the storage unit access controller 905 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • the storage unit access controller 905 is mainly used to move the input data in the external memory to the unified memory 906 or the weight data to the weight memory 902 or the input data data to the input memory 901.
  • the vector calculation unit 907 usually includes a plurality of operation processing units. If necessary, the output of the operation circuit 903 is further processed, such as vector multiplication, vector addition, exponential operation, logarithmic operation, and / or size comparison.
  • the vector calculation unit 907 can store the processed vectors into the unified memory 906.
  • the vector calculation unit 907 may apply a non-linear function to the output of the arithmetic circuit 903, such as a vector of accumulated values, to generate an activation value.
  • the vector calculation unit 907 generates a normalized value, a merged value, or both.
  • the processed vector can be used as an activation input for the arithmetic circuit 903.
  • the instruction fetch memory 909 connected to the controller 904 is used to store instructions used by the controller 904.
  • the unified memory 906, the input memory 901, the weight memory 902, and the instruction fetch memory 909 are all On-Chip memories.
  • the external memory in the figure is independent of the NPU hardware architecture.
  • the method provided in this embodiment can also be applied to non-terminal computer equipment, such as a cloud server.
  • the device embodiments described above are only schematic, and the modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, that is, may be located in One place, or can be distributed to multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, which can be specifically implemented as one or more communication buses or signal lines.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Neurology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Collating Specific Patterns (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Executing Machine-Instructions (AREA)
  • Storage Device Security (AREA)
  • Stored Programmes (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请提供一种在TEE侧实现多核并行的方法、装置及计算机系统等。该方法包括TEE创建多个子线程,所述子线程用于实现TEE侧部署的TA的子功能;针对每一个子线程,所述TEE触发富执行环境REE生成与该子线程对应的影子线程,所述影子线程的运行促使运行所述影子线程的核进入所述TEE;所述TEE将创建的子线程调度到对应的影子线程所在的核上执行。利用该方法,具备高性能要求的业务中的多个业务逻辑能够在TEE中并行运行,而且TEE触发REE生成线程并自动进入TEE侧,通过该方式实现了在TEE侧主动加核,提高了TEE侧并行的灵活性。。

Description

在TEE侧实现多核并行的方法、装置及系统 技术领域
本申请涉及操作系统技术,尤其涉及在多域操作系统中实现多核并行的方法、装置及系统。
背景技术
为了保证终端设备的安全性,出现了以ARM(advanced RISC machines)
Figure PCTCN2019086133-appb-000001
为代表的终端设备安全框架(其中RISC的英文全称为reduced instruction set computer)。在ARM
Figure PCTCN2019086133-appb-000002
框架下,系统级的安全是通过将片上系统(system on chips,SoC)的软硬件资源划分到两个世界中获得的。这两个世界即正常世界(normal world)和安全世界(secure world)(也可以叫安全域和非安全域),这两个世界分别对应富执行环境(rich execution environment,REE)和可信执行环境(trusted execution environment,TEE)。REE和TEE运行于同一台物理设备上,分别运行一套操作系统。REE运行安全性要求低的客户端应用(client application,CA);TEE则运行需要保障其安全性的可信应用(trusted application,TA),为授权的可信应用TA提供安全的执行环境。CA和TA之间通过ARM
Figure PCTCN2019086133-appb-000003
提供的通信机制进行通信,就如同客户端和服务器一般。
目前终端上越来越多的应用场景应用了生物识别技术,例如指纹识别或人脸识别,二者均可以用于终端解锁或支付等场景。生物识别技术的使用给终端用户带来了很大的便利。但是另一方面,由于生物识别技术本身会存储一些人自身的生物特征,这些都是敏感的个人数据,所以应用这种技术的方案在使用时对终端安全性要求很高。
现有技术可以利用ARM
Figure PCTCN2019086133-appb-000004
架构保障生物识别方案的安全性。具体的,生物识别的主要业务逻辑(包括特征提取、特征比对、活体检测、以及特征存储)被实现为TA,并将该TA运行在TEE,同时生物特征数据也保存在TEE,这样,通过TEE提供的安全环境保障整个方案的安全性。
但是,ARM 早期是按照TEE只能运行在一个核(通常称之为0号核)上设计的。在以前应用场景还较为简单的情况下,这样的设计能极大简化系统复杂度,比较符合当时的需求。但是,上述应用场景对性能要求较高,且生物识别技术的处理逻辑比较复杂,对TEE提出了很高的计算能力的要求,所以原先单核TEE的实现方案就很难满足这些应用场景的性能要求了。从用户体验的角度,这样的单核方案可能导致不可接受的人脸解锁速度或人脸支付速度。
为解决这个问题,OP-TEE(open portable trusted execution environment)现有技术提供了一种简单的多核实现方案。如图1所示,它允许多个CA并行地向多个TA发起安全访问。具体的,每个CA可通过调用REE侧的TEE客户端库(TEE client lib)访问TEE驱动,然后TEE驱动会发送安全监视调用(Secure Monitor Call,SMC)指令;每个核都可以进入监视器模式,而且每个核的状态都是独立的;之后每个核进入安全模式,即TEE,然后TEE从一个线程池中寻找与CA对应的线程完成TEE内的任务。但是,一方面,TEE 中的核数直接受限于REE侧调用TA的线程的数目,在核不够用的场景下,TEE无法主动创建线程;另一方面,这个方案中并行的多个TA仅通过一个简单的线程池实现,没有提供统一的调度和负载均衡机制,多个TA的并行运行会对整个系统的性能和功耗造成影响;。
发明内容
本申请提供一种多核并行方法、装置及计算机系统等,该方法可以运行在基于ARM
Figure PCTCN2019086133-appb-000006
的终端设备或其它类型的计算机系统上。利用该方案,具备高性能要求的业务中的多个业务逻辑能够在TEE中并行运行,而且TEE侧能够主动加核,提高了TEE侧并行的灵活性。
以下从多个方面介绍本申请,容易理解的是,该以下多个方面的实现方式可互相参考。
第一方面,本申请提供一种计算机系统,所述计算机系统上部署有富执行环境REE和可信执行环境TEE,所述REE侧部署有CA,所述TEE侧部署有TA。所述CA用于向所述TA发送调用请求以调用所述TA的功能,所述TA的功能包括多个子功能。所述TEE侧还部署有线程创建模块、通知模块、以及TEE调度模块,其中,所述线程创建模块用于在所述TA的调用下创建子线程,所述子线程用以实现所述多个子功能中的一个子功能;所述通知模块用于触发所述REE生成影子线程,所述影子线程的运行促使运行所述影子线程的核进入所述TEE;所述TEE调度模块用于将所述子线程调度到所述核上运行。TA例如为实现人脸识别功能的TA(简称人脸识别TA)、实现指纹识别功能的TA(简称指纹识别TA)等。
可见,TEE侧的TA主动创建一个或多个子线程(通常为多个子线程),并且每创建一个子线程就通过发送通知的方式触发REE侧生成一个影子线程,该影子线程的目的就是将运行影子线程的核带入TEE侧,然后TEE调度模块再将TA创建的子线程调度到该核上运行。通过这样的方法,TEE侧的TA可以根据需求创建子线程并主动“拉”核以运行该子线程,一个或多个子线程与TA主线程一起运行,从而实现了TEE侧多核并行。并且该主动“拉”核的方式相对于现有技术而言更加灵活,更加有效。
在一些实现方式下,所述REE侧还部署有通知处理模块,则:所述通知模块具体用于在所述子线程被创建后产生通知,并将所述通知发送到所述通知处理模块;以及所述通知处理模块用于根据所述通知创建影子线程,所述影子线程在被运行时使得运行所述影子线程的核进入所述TEE侧。通知例如为软中断。
影子线程创建并运行之后会使得运行所述影子线程的核进入所述TEE侧,这次是该影子线程“首次”进入所述TEE侧。一段时间之后,该影子线程可能会退回到REE侧,也可能再次进入TEE侧。
需要说明的是,一个影子线程进入REE或TEE侧可以理解运行该影子线程的核进入REE或TEE,或者可以理解为运行该影子线程的核运行在REE或TEE环境中或该核运行在REE或TEE模式下。
在一些实现方式下,所述TEE调度模块还用于:记录所述影子线程与所述子线程的 对应关系。具体的,所述TEE调度模块为所述子线程创建第一线程标识,所述第一线程标识用于指示访问所述子线程的线程;并且在将所述子线程调度到所述核上运行之后将所述第一线程标识的值设置为所述影子线程的标识。
“影子线程”可以认为是REE侧的一个虚拟CA,该虚拟CA来访问TEE侧的子线程,通过记录影子线程的标识建立该影子线程与子线程的client/server关系。
通过以上方案记录影子线程与子线程的对应关系,从而无论影子线程有没有被REE侧调度器调度到另一个核上,都能够保证当影子线程再次进入TEE侧,与其对应的子线程还能被调度到运行影子线程的核上执行。
在一些实现方式下,该第一线程标识包含在该子线程对应的线程控制块(thread control block,TCB)中,是TCB中的一个字段。
在一些实现方式下,所述TEE调度模块具体用于:当确定所述影子线程为首次进入所述TEE时,将新创建的所述子线程调度到运行所述影子线程的核上运行。
这里假设影子线程就是该子线程触发创建的,所以当确定所述影子线程为首次进入所述TEE时,就将该子线程调度到所述运行所述影子线程的核上运行。
在一些实现方式下,所述TEE调度模块还用于:当确定所述影子线程为再次进入所述TEE时,根据记录的所述影子线程与所述子线程的对应关系,将所述子线程调度到运行所述影子线程的当前核上运行。这里运行所述影子线程的“当前核”可能是原先的核,也可能是另一个核。
在一些实现方式下,影子线程通过调用安全监视指令SMC指令使得运行该影子线程的核进入TEE侧,可能是首次进入,也可能是再次进入。“再次进入”指的是非首次进入。所述SMC指令中包含一个参数,所述参数用于指示所述核是首次进入所述TEE或再次进入所述TEE。相应的,所述TEE调度模块用于根据所述参数确定所述影子线程为再次进入所述TEE。
在一些实现方式下,所述TEE调度模块还用于记录运行所述影子线程的当前核与所述影子线程的对应关系。
这里运行所述影子线程的“当前核”可能是原先的核,也可能是另一个核。
在一些实现方式下,所述TEE调度模块具体用于在运行所述影子线程的当前核进入所述TEE后,在所述当前核在全局状态数组中对应的元素处记录所述影子线程的标识,其中,所述全局状态数组包含N个元素,每个元素对应所述计算机系统的一个核;在运行所述影子线程的当前核离开所述TEE后,将所述当前核在所述全局状态数组中对应的元素清空。
通过记录运行所述影子线程的当前核与所述影子线程的对应关系,能够为调度提供数据准备,即知道当前是哪个核与哪个影子线程,从而根据影子线程的标识找到对应的子线程,并将子线程调度到该核上运行。
在一些实现方式下,所述TEE调度模块具体用于在运行所述影子线程的当前核进入所述TEE后,在所述当前核在所述全局状态数组中对应的元素处记录所述影子线程的标识,并查找目标子线程,将所述目标子线程调度到所述当前核上运行,其中,所述目标子线程对应的第一线程标识为所述当前核在所述全局状态数组中对应的元素处所记录的标识。
在一些实现方式下,影子线程退回REE侧可能是由中断触发的。
在一些实现方式下,TEE调度模块确定所述影子线程是首次进入TEE,则将还未运行过的子线程(也可以理解为是还未与任何一个影子线程建立对应关系的子线程)调度到运行该影子线程的核上运行。该子线程可以由线程的运行状态指示出来,例如,将新创建的子线程设置为某种特定运行状态,这样一个核首次被拉入TEE侧时就可以识别到该子线程,并运行该子线程。在另一些实现方式下,TEE调度模块可以通过第一线程标识为空来识别新创建的(还未运行过)子线程。
在一些实现方式下,TEE调度模块确定所述影子线程并非首次进入TEE时,确定目标子线程,并将所述目标子线程调度到运行所述影子线程的当前核上运行,其中,所述目标子线程的第一线程标识为所述影子线程的标识。
在一些实现方式下,所述TEE还部署有神经网络处理单元NPU驱动。所述NPU驱动用于在所述TA的一个或多个子线程的调用下驱动NPU运行。
NPU是一种专用的神经网络处理器,用于实现大规模的复杂的并行运算,尤其是神经网络相关的运算。当某些TA需要使用复杂算法时,可以通过软件实现该算法,也可以像本申请提出的方法一样,调用NPU加速。
通过将NPU驱动部署在TEE侧,实现了在TEE侧调用NPU。同时,由于本申请提出的方案中TEE侧可以多核并行运行,这样为NPU在TEE侧的使用奠定了良好的基础,从而提高了系统的整体性能。
在一些实现方式下,所述TEE还部署有安全存储单元和硬件驱动单元,所述安全存储单元和所述硬件驱动单元仅能被TEE访问;所述硬件驱动单元用于在所述TA的一个或多个子线程的调用下访问对应的硬件;所述安全存储单元用于存储所述硬件采集的数据。这里的安全存储单元理解为一块存储区域,由于仅能被TEE访问,所以是安全的。
在一些实现方式下,所述安全存储单元为缓存,固定大小的缓存或非固定大小的缓存都可以,非固定大小的缓存也可以简称为动态缓存。在一些实现方式下,所述硬件驱动单元为相机驱动,则其对应的硬件为相机。
TA在TEE侧直接访问硬件,并将硬件采集的数据存放到TEE侧存储区域中,进一步保证了TA使用数据的过程以及使用数据本身的安全性。例如,若针对3D(3 dimensions)人脸识别TA,可以通过本申请提供的方法,将相机驱动部署在TEE侧,并将相机采集的人脸图像存放在TEE侧,TA可以直接在TEE侧驱动相机,访问人脸图像,这样就进一步保证了整个人脸识别过程的安全性。
由于模块划分方式无法穷举,本申请提供的第一方面中的模块仅为举例,不应视为限制本申请的范围。所有在TEE侧部署的模块执行的方法也可以看作是TEE执行的方法,相应的,所有在REE侧部署的模块执行的方法也可以看作是REE执行的方法。本申请中TEE和REE执行的方法,除了某些硬件执行的步骤,通常可以认为是TEE和REE的操作系统或应用执行的方法。
第二方面,本申请提供一种在可信执行环境TEE侧实现多核并行的方法,该方法运行于多核的计算机设备上。该方法包括:TEE创建子线程,所述子线程用于实现TEE侧部署的TA的一个子功能;所述TEE触发富执行环境REE生成影子线程,所述影子线程的运行促使运行所述影子线程的核进入TEE;所述TEE将创建的所述子线程调度到所述 核上执行。
在一些实现方式下,所述TEE在所述子线程被创建后产生通知(例如软中断),并将所述通知发送到所述REE,以使得所述REE根据所述通知创建所述影子线程。
在一些实现方式下,所述方法还包括:所述TEE记录所述影子线程与所述子线程的对应关系。
在一些实现方式下,所述TEE记录所述影子线程与所述子线程的对应关系,包括:所述TEE将所述影子线程的标识记录在所述子线程的线程控制块TCB中。
在一些实现方式下,所述方法还包括:当所述影子线程的运行再次促使运行所述影子线程的当前核进入所述TEE(也可以理解为所述影子线程再次进入所述TEE)之后,所述TEE根据记录的所述影子线程与所述子线程的对应关系,将所述子线程调度到运行影子线程的当前核上运行。这里的“当前核”可能是之前的核,也可能是另一个核,因为影子线程可能被调度到不同的核上运行。
在一些实现方式下,所述方法还包括:所述TEE记录运行所述影子线程的当前核与所述影子线程的对应关系。具体的,在运行所述影子线程的当前核进入所述TEE后,在所述当前核在全局状态数组中对应的元素处记录所述影子线程的标识,其中,所述全局状态数组包含N个元素,每个元素对应所述计算机系统的一个核;在运行所述影子线程的当前核离开所述TEE后,将所述当前核在所述全局状态数组中对应的元素处清空。
在一些实现方式下,所述方法还包括:所述TEE通过调用部署在TEE的神经网络处理单元NPU驱动实现对NPU的调用。
在一些实现方式下,所述方法还包括:所述TEE通过部署在TEE侧的硬件驱动单元访问对应的硬件,并将硬件采集的数据存储在部署在TEE侧的安全存储单元中。
在一些实现方式下,所述TA为实现人脸识别功能的TA或为实现指纹识别功能的TA,或为同时实现人脸识别和指纹识别功能的TA。人脸识别具体可以为3D人脸识别。
第三方面,本申请提供一种计算机系统,所述计算机系统包括存储器和处理器,所述存储器用于存储计算机可读指令(或者称之为计算机程序),所述处理器用于读取所述计算机可读指令以实现前述任意实现方式提供的方法。
第四方面,本申请提供一种计算机存储介质,该计算机存储介质可以是非易失性的。该计算机存储介质中存储有计算机可读指令,当该计算机可读指令被处理器执行时实现前述任意实现方式提供的方法。
第五方面,本申请提供一种计算机程序产品,该计算机程序产品中包含计算机可读指令,当该计算机可读指令被处理器执行时实现前述任意实现方式提供的方法。
可见,本申请提供的TEE侧多核并行方法、装置及计算机系统,能够在TEE侧实现多个任务的并行执行,例如一个TA的多个子任务的并行执行。这样,一些安全性要求较高的复杂业务,例如3D人脸识别等,就可以全部被放在TEE侧,同时又能被并行执行,从而既能满足这类型业务的安全需求,又能满足性能需求。进一步的,通过TEE侧触发REE侧生成影子线程的方式实现了主动“拉”核进入TEE侧,提高了TEE侧并行的灵活性。
进一步的,在提供并行机制的基础上,通过记录REE侧CA和TEE侧TA之间的访问对应关系(即CA-TA调度组),保证了CA(包括影子线程)和对应的TA(包 括TA的子线程)总能够在同一个核上运行,从而保证REE侧CA负载的计算准确性,为实现系统整体的负载均衡提供了良好的基础。
进一步的,通过在TEE侧引入NPU,使得在并行运行的基础上,再叠加NPU的加速能力,进一步提高了业务执行效率。
进一步的,通过将业务所需数据存储在TEE侧安全的存储介质中,能够进一步保证该数据的安全性,从而保证业务的安全性。
附图说明
为了更清楚地说明本申请提供的技术方案,下面将对附图作简单地介绍。显而易见地,下面描述的附图仅仅是本申请的一些实施例。
图1为现有技术提供的TEE侧多核方案示意图;
图2为本申请一实施例提供的终端设备的结构示意图;
图3为本申请一实施例提供的TEE侧多核并行方案的部署示意图;
图4a和图4b为本申请一实施例提供的TEE侧多核并行方案的方法示意图;
图5为本申请形成的多个CA-TA调度组的示意图;
图6为本申请一实施例提供的实现人脸/指纹双重认证的终端系统示意图
图7为本申请一实施例提供的支付方案的示意图;
图8为本申请一实施例提供的一种计算机系统的结构示意图;
图9为本申请一实施例提供的一种神经网络处理单元的结构示意图。
具体实施方式
在介绍本实施例之前,首先介绍本实施例中可能出现的几个概念。应理解的是,以下的概念解释可能会因为本实施例的具体情况有所限制,但并不代表本申请仅能局限于该具体情况,以下概念的解释伴随不同实施例的具体情况可能也会存在差异。
多核调度:具有多核处理器的计算机系统提供的支持任务在多个核上创建、调度、迁移、销毁等操作的调度机制。
负载均衡:多核处理器上并行的多个任务需要通过平衡任务在不同核上的分布来做系统负载的均衡,达到系统整体的性能和功耗目标。
任务:在本申请中是一个泛指的概念,计算机需要实现的事情都可以称之为任务,例如进程、线程、子线程、CA、TA、某种服务等。
线程:有时也被称为轻量级进程(light weight process,LWP),是程序执行流的最小单元。线程的实体包括程序、数据和TCB。线程是动态概念,它的动态特性由线程控制块(thread control block,TCB)描述。TCB可以包括以下信息:线程状态、当线程不运行时,被保存的现场资源、一组执行堆栈、存放每个线程的局部变量主存区、访问同一个进程中的主存和其它资源等。本实施例对TCB做出了改变。
中断请求(interrupt request,IRQ):中断请求一般指的是硬件或软件产生的一种事件,硬件将该事件发送到处理器,当处理器接收到该事件时,暂时停止当前程序的执行转 而执行该事件对应的程序。中断请求包含软中断和硬中断,硬件(例如网卡、硬盘、键盘、鼠标等)产生的对处理器的中断通常称之为硬中断、硬件中断(有时也简称为中断),而软中断一般由处理器上当前运行的进程产生。软中断的处理过程模拟了硬中断的处理过程,当某一软中断发生后,首先需要设置对应的中断标记位,触发中断事务,然后唤醒守护线程去检测中断状态寄存器,如果通过查询发现有软中断发生,那么通过查询软中断向量表调用相应的软中断服务程序,这就是软中断的处理过程。软中断与硬中断不同的地方是从中断标记到中断服务程序的映射过程。在硬中断发生之后,处理器需要将硬中断请求通过向量表映射成具体的服务程序,这个过程是硬件自动完成的,但是软中断不是,其需要守护线程去实现这一过程,这也就是软件模拟的中断,故称之为软中断。
CFS(completely fair scheduler,CFS)调度器:
Figure PCTCN2019086133-appb-000007
操作系统的内核中的一个被实现为调度模块的完全公平调度程序。
图2为本实施例提供的终端设备的结构示意图。该终端设备可以是台式机、笔记本、手机、平板电脑、智能手表、智能手环等。该终端设备上部署有
Figure PCTCN2019086133-appb-000008
系统,该系统包含REE和TEE,REE和TEE上分别运行有
Figure PCTCN2019086133-appb-000009
操作系统和一种TEE操作系统(例如开源的
Figure PCTCN2019086133-appb-000010
操作系统)。
Figure PCTCN2019086133-appb-000011
操作系统和TEE OS又分为用户态和内核态。REE侧的用户态内部署了多个CA,例如人脸识别CA、指纹识别CA等。TEE侧的用户态内部署了多个可信应用,例如指纹识别TA、人脸识别TA等。REE侧的内核态内部署了
Figure PCTCN2019086133-appb-000012
组件,TEE侧的内核态则部署了可信核心组件。REE中的CA与TEE中的TA构成了类似client/server的架构,CA作为客户端,TA作为服务端,由CA端发起访问操作,两者通过REE通信代理、硬件层的消息通道以及TEE通信代理交互数据,三者为CA和TA建立了安全的通信通道,一定程度上确保了数据传输的安全性。具体的,CA调用TEE客户端API(application program interface)来和对应的TA实现通信;TA调用TEE内部API(internal API)来使用TEE提供的编程资源实现相关功能。
图3为本实施例提供的TEE侧多核并行方案的部署示意图。本实施例以人脸识别CA和人脸识别TA为例介绍该技术方案。人脸识别CA和人脸识别TA301分别部署在REE侧和TEE侧,二者协同完成人脸验证等人脸识别服务,广泛应用于终端解锁、应用登陆、金融支付等场景。本实施例中人脸识别TA301又可以包括特征提取、特征比对、活体检测以及特征存储这4个子功能。在其它一些实施例中,人脸识别TA可以包括更多、更少或其它类型的子功能,本申请对此不做限定。
进一步的,TEE侧部署有TEE调度模块305、线程创建模块(例如libthread)302、通知模块303、以及TEE调度模块305;REE侧部署有通知处理模块304。监视器(monitor)为
Figure PCTCN2019086133-appb-000013
系统提供的现有模块,用于REE到TEE的切换。线程创建模块302用于在人脸识别TA301的调用下创建子线程,并且调用通知模块303产生软中断;通知模块303用于产生软终端并将该软中断发送给REE侧的通知处理模块,通知处理模块304用于接收该软中断,并创建影子线程,创建后的影子线程被调度到一个核上运行。之后影子线程通过发送SMC指令进入TEE侧,相当于运行该影子线程的核进入TEE侧(即安全模式)。
本实施例中在硬件层还部署有安全硬件和非安全硬件,其中安全硬件指的是仅能 被TEE访问的硬件,非安全硬件指的是既能被REE访问又能被TEE访问的硬件,或仅能被REE访问的硬件。
图4a和图4b为本实施例提供的TEE侧多核并行方案的方法示意图。下面基于图3和图4a-4b详细介绍方案的实现过程。
S101、人脸识别CA通过
Figure PCTCN2019086133-appb-000014
提供的SMC指令向TEE侧的人脸识别TA301发送调用请求。该过程为现有技术,本申请对此不再详述,为了方便读者理解,可以这样理解该过程:通过
Figure PCTCN2019086133-appb-000015
提供的SMC指令,运行人脸识别CA的核进入到TEE侧(安全模式),并开始在安全模式下运行人脸识别TA以实现人脸识别TA的功能。
S102、人脸识别TA301接收到该调用请求之后,创建1个子线程T1。具体的,人脸识别TA301通过线程创建模块302(例如libthread)中的pthread_create接口创建这个子线程T1。
在本实施例中人脸识别TA最终会创建4个子线程T1-T4。该4个子线程分别处理特征提取、特征比对、活体检测以及特征存储这4个子功能。本实施例以1个子线程的创建和运行为例进行介绍,其余3个子线程的创建和运行过程可参考这个子线程进行创建和运行。
S103、线程创建模块302在创建完子线程T1后调用通知模块303产生一个软中断,该软中断被发送到REE侧的通知处理模块304。
S103a、TEE调度模块305为这个子线程T1创建对应的任务控制数据结构,即线程控制块(thread control block,TCB)
示例性的,该TCB的结构如下所示:
Figure PCTCN2019086133-appb-000016
其中,“任务”指的是子线程,每个子线程的TCB都包含运行状态、调度策略、TCB名称等。每个字段前面的英文标识符表示的是字段的值的类型。新创建的子线程, 其运行状态会被设置到一个特定的运行状态,例如state=000,表示等待新的核进来执行它。
本实施例提供的TCB中包含一个ca字段,该ca字段为本申请提出的“第一线程标识”的一种实现。该ca字段的值可以默认为0。
S104、通知处理模块304收到该软中断之后会生成一个线程S1,该线程S1的进程标识(process identification,PID)=S1,该线程S1通过发送SMC指令进入TEE侧。
该线程在本实施例的下文中称之为影子线程,其本质和普通线程一样,只是实现的功能在本实施例中具有特殊性。对于人脸识别TA301而言,访问它的CA只有一个就是人脸识别TA,但在本实施例中人脸识别TA301不是由一个线程完成,而是由多个线程共同完成,所以影子线程可以理解为访问子线程的“虚拟CA”。
容易理解的是,这里所谓“线程进入TEE侧”,也就是“运行该线程的核进入TEE侧”,亦或者说“运行该线程的核进入TEE模式(或安全模式)”等。软件涉及一些抽象描述,TrustZone技术也不例外,针对同样的情形可能有不同的描述。
需要说明的是,本申请中的“核”指的是最小的物理处理单元。
具体的,影子线程发送的SMC指令中可以包含一个参数,该参数用于指示该影子线程是第一次进入TEE侧。例如,该参数可以为firstIn,当firstIn=true时表明该影子线程是第一次进入TEE侧,当firstIn=false时表明该影子线程非第一次进入TEE侧;或者影子线程仅在第一次发送SMC指令时包含一个参数,其它时候不包含该参数,这样接收者也可以通过判断存在或不存在该参数来确定影子线程是否是第一次进入TEE侧。
S105、影子线程S1进入TEE侧之后,也就是说运行影子线程S1的核进入TEE侧之后,TEE调度模块305在全局状态数组中该核的位置记录该影子线程S1的PID。
示例性的,全局状态数组ctx_map_t[CPU_NUM]如下所示:
Figure PCTCN2019086133-appb-000017
其中,“CPU”指的就是前文所说的“核”。本实施例中运行影子线程S1的核为标号为1的核(下文简称为核1),那么TEE调度模块305就在ctx_map_t[1]的ca字段处记录影子线程S1的PID,即ca=S1,表明此次进入TEE的(虚拟)CA为影子线程S1。
S106、TEE调度模块305确定影子线程S1首次进入TEE侧时,就会去找处于特定运行状态即state=000的子线程T1,将该子线程T1调度到影子线程的当前核即核1上运行。
S107、进一步的,TEE调度模块305将ctx_map_t[1]的ca字段的值(即S1)赋值到子线程T1对应的TCB的ca字段中。这样,影子线程S1和子线程T1分别作为CA和TA的一个CA-TA组就建立完成了。
以上步骤S101-S107为影子线程首次创建和首次进入TEE侧的过程。重复执行上述步骤S102-S103、S103a、以及S104-S107就可以创建另外3个子线程和对应的另外3个影子线程,并形成另外3个CA-TA组。这样,在TEE侧就有多个核在同时运行,同时执行人脸识别TA301的4个子功能,极大的提高人脸识别TA的执行效率。
进一步的,通过以上方法,实现了TEE主动“拉”核进入TEE侧,使得TEE侧即使作为被动操作系统也实现了子线程的主动执行,从而提高了TEE侧多核并行的灵活性。
影子线程S1和普通CA一样,在运行的过程中可能被中断并退回到REE侧,在REE侧可能会被调度到其它核上,此时为了保证子线程T1仍然和影子线程S1在同一个核上运行,参考图4b,需要执行以下操作。
S108、当影子线程(即核1)退回到REE侧后,TEE调度模块305将ctx_map_t[1]的ca字段清除。
S109、当影子线程S1重新进入TEE侧时,TEE调度模块305将全局状态数组中对应的位置的ca字段设置为S1。
具体的,再次进入的影子线程S1若是还在核1上运行,那么TEE调度模块305仍然将ctx_map_t[1]的ca字段设置为S1;若影子线程S1在REE侧被REE侧的调度模块(例如CFS调度器)调度到另一个核上运行,例如核2,那么TEE调度模块305将ctx_map_t[2]的ca字段设置为S1,
S110、TEE调度模块305查找目标子线程,将该目标子线程调度到当前核上运行。
该目标子线程需要满足如下条件:其TCB中的ca字段与全局状态数组中当前核对应的ca字段相同,在本实施例中即两者均为S1。可见,在本实施例中,该目标子线程就是子线程T1,因此子线程T1被调度到当前核上运行。“当前核”根据步骤S109的描述可能是核1,也可能是核2。
容易理解的是,目标子线程在本实施例中处于可执行状态才可以被调度到核上执行,若是处于某种不可执行状态,则TEE调度模块305可以根据调度策略让核1或核2等待或执行其它可执行的进程,本申请对此不做限定。
图5示出了本申请提供的方法实现之后形成的多个CA-TA调度组。从图中可以看出,人脸识别TA主线程和人脸识别CA形成一个调度组,其它的4个子线程分别与影子线程S1-S4形成4个调度组,这5个调度组会和其它应用一起参与到CFS调度器的负载均衡调度过程中。
可见,通过本实施例提供的方案,即使影子线程被调度到其它核上运行,也总能保证TEE侧与之对应的子线程能够被调度到同一个核上运行,从而将影子线程和对应的子线程作为一个CA-TA调度组进行统一调度,保证CA负载计算的准确性。
接下来,本申请介绍另一个场景,该场景需要人脸识别和指纹识别的双重认证。在该场景下,本申请提供的多核并行方案仍然能够提供CA和TA的统一调度。
图6为本实施例提供的实现人脸/指纹双重认证的终端系统示意图。双重认证实现方案描述如下。
REE侧的人脸识别CA608和指纹识别CA607分别向TEE侧发起请求。发起请求的方式是通过TrustZone驱动调用监视器进入监视模式,然后再从监视模式进入TEE模式。之后,TA管理器609会根据请求中携带的信息确定分别由人脸识别TA601和指纹识别TA604来处理人脸识别CA的请求和指纹识别CA的请求。
容易理解的是,人脸识别CA和指纹识别CA实质是两个线程,分别在两个核上运行,在前述步骤之后,这两个核都进入TEE侧。
TEE调度模块610在全局状态数组中与前述两个核对应的位置分别记录人脸识别CA和指纹识别CA的PID,并且在人脸识别TA601的TCB的ca字段和指纹识别TA604的TCB的ca字段中分别记录人脸识别CA608和指纹识别CA607的PID。这样,就建立起两对CA-TA调度组,TEE侧的TA发生的负载就可以与对应的CA捆绑为一个负载计算单元。
另外,TA通常还会请求其它的服务进程和/或驱动进程来完成,这些被间接访问的进程,也需要建立CA-TA调度组。人脸识别TA601通过发送消息的方式调用权限管理服务602,权限管理服务602调用相机驱动603;类似的,指纹识别TA604调用权限管理服务605,权限管理服务605调用指纹驱动。在本实施例中,权限管理服务602和权限管理服务605指同一个服务,在其他实施例中,这两个服务也可以是两个独立的服务。
以上“调用”本质上是进程间通信(inter progress communication,IPC)。TEE内的IPC机制通过消息来完成,在本实施例中消息传递时将消息发起者的TCB中的ca字段的值传递给消息接收者,因此TA的调用链上的所有服务进程也相应的被拉入对应的CA-TA调度组,如图5所示本实施例形成两个调度组。
当一个服务进程处理完一个TA的消息后收到另一个TA的消息时,服务进程又会随着新消息更新ca值,而被带到另一个CA-TA组。如图所示,权限管理服务602可以从人脸识别CA-TA调度组切换到指纹识别CA-TA调度组。
具体的,人脸识别TA601发送消息给权限管理服务602,并将该人脸识别TA601的TCB中的ca字段的值,即人脸识别CA的PID,传递给权限管理服务602,权限管理服务602的TCB中的ca字段的值也被设置为人脸识别CA601的PID。之后权限管理服务602又被指纹识别TA604调用,权限管理服务602(相当于图中的权限管理服务605)的TCB中的ca字段的值重新被设置为指纹识别CA的PID。
以一个CA-TA调度组为一个调度单元,被REE侧的CFS调度器来做统一的调度,该调度可以是由负载均衡需求触发的。例如,CA被CFS调度器调度到另一个核上,那么同一个调度组的TA以及TA所调用的其他进程也会被TEE调度模块610调度到该核上。因此,通过本申请提供的方法,可以实现多TA并行时,CA和对应TA的统一调度,从而保证CA负载的计算准确性。举例来说,如果人脸识别CA608被调度到 另一个核上,而人脸识别TA601没有被调度到该核上,该核上运行的其它线程的负载就会被计算到人脸识别CA608的负载中,而这是不正确的。
进一步的,此场景中也会存在类似前述S108的情况。例如,当某个中断请求(interupt request,IRQ)发生时,执行人脸识别TA601的核会被中断并退到REE侧去响应该中断请求,退出时TEE调度模块610会将全局状态数组中该核对应的ca字段清除。若人脸识别CA608被REE侧CFS调度器调度到新的核上运行,之后又进到TEE侧时,TEE调度模块610会将全局状态数组中该新的核对应的ca字段设置为人脸识别CA608的PID。TEE调度模块610根据该人脸识别CA608的PID查找目标任务,该目标任务的TCB字段中的ca值也为人脸识别CA608的PID。根据之前实施例的描述可以得知该目标任务包括人脸识别TA601、相机驱动603(进程)、权限管理服务602(进程)中的一个或多个。权限管理服务可能不属于该目标任务,因为其ca字段可能因为指纹识别TA604的调用而被修改。之后,TEE调度模块610将该目标任务调度到该新的核上运行,从而完成TA以及TA调用的服务在核间的迁移,实现CA-TA调度组的统一迁移,始终保证CA-TA调度组中包含的任务在同一个核上运行。
请参阅图7,为本实施例提供的支付方案的示意图。该支付方案除了应用前述实施例提供的TEE多核并行的方法之外,还进一步利用了神经网络处理单元以及其它方式来提高安全性和性能。
该支付方案包含多个业务逻辑:支付应用701以及支付应用所触发的人脸识别CA702、人脸识别TA708、相机服务703等,其中人脸识别TA708又包含4个子业务逻辑:特征提取、活体检测、特征比对以及特征存储。该方案涉及的硬件包括相机706、神经网络处理单元(neuronal processing unit,NPU)715、存储器714、中央处理器(未示出)等硬件,其中,TEE侧部署有相机706、NPU715和存储器714的驱动。REE侧的相机服务703、人脸识别CA702和NPU服务CA704都是只负责业务的发起和一些非关键的业务逻辑处理。
需要说明的是,在本实施中的TEE侧部署有相机706的驱动,但在REE侧也可以部署相机706的驱动,以便于REE侧的应用或服务通过该驱动访问相机706。
具体的,支付应用701发起人脸支付认证请求后,通过REE的人脸识别CA702调用TEE侧的人脸识别TA708,从而发起人脸识别流程。人脸识别TA708通过TEE侧的相机驱动705来访问相机706。具体的,人脸识别TA708可以通过图像信号处理器(image signal processor,ISP)驱动ISP来访问相机706。
之后,相机706采集的图像被存放到图像安全缓存(buffer)707中,该图像安全buffer707的访问地址被返回给人脸识别TA708。该图像安全buffer707可以理解为一个位于TEE侧的软件,也可以理解为一块仅能供TEE访问的存储空间(例如内存)。人脸识别TA708会根据地址访问图像安全buffer707,根据预存的人脸模板等信息对采集的图像执行特征提取、活体检测、特征比对、特征存储等算法。
现有技术通常仅在REE侧部署相机驱动,并将人脸识别TA的部分功能,例如特征提取放到REE侧,特征提取功能在REE侧调用相机驱动并实现图像采集,但是通过本实施例提供的上述方式,人脸识别TA708可以通过部署在TEE侧的相机驱动705直接访问相机706,并将图像缓存在TEE侧的图像安全缓存707中,从而保证相机的 使用和数据的保存都在TEE侧完成,进一步保证了数据的安全性。
在执行算法的过程中,人脸识别TA708会通过TEE侧的NPU服务TA709来访问NPU驱动712,进而通过NPU驱动712调用NPU715来提升处理速度。最终,支付应用701会通过它的支付应用TA710来获取人脸识别的最终结果,比如支付宝应用通过国际金融认证联盟(internet finance authentiation alliance,ifaa)TA来获取人脸识别的最终结果。
人脸模板是预先录入到终端设备中,支付时采集到的人脸图像需要与人脸模板匹配才能完成支付应用,所以人脸模板的安全性很重要。为确保人脸模板不被篡改,本实施例中将人脸模板通过TEE侧的存储服务713存储到存储器714中,该存储器714可以是具备一定安全特性的存储器,例如回滚保护存储块(replay protected memory block,RPMB),该存储器可以设置为仅可以被TEE侧服务访问,进一步提高存储器的安全性,从而保证人脸模板的安全性,进而保证人脸识别过程的安全性。
利用本申请提供的方法实现的人脸识别方案,既能满足安全性,又能满足高性能需求。不同于现有技术为了提高人脸识别的效率,将人脸识别过程中包含的部分关键业务逻辑放到REE侧实现(例如活体检测在REE侧实现),本申请提供的方案将人脸识别过程中包含的关键业务逻辑全部放到TEE侧实现,通过多核并行的方式提高人脸识别过程的效率以满足性能要求。与此同时,人脸识别过程中产生或利用的数据(例如图像等)都会存储在TEE侧,从而利用TEE的安全保障机制进一步提高人脸识别的安全性。
请参考图8,为本实施例提供的一种计算机系统的结构示意图。该计算机系统可以为终端设备。如图所示,该计算机系统包括通信模块810、传感器820、用户输入模块830、输出模块840、处理器850、音视频输入模块860、存储器870以及电源880。进一步的,本实施例提供的计算机系统还可以包括NPU890。
通信模块810可以包括至少一个能使该计算机系统与通信系统或其他计算机系统之间进行通信的模块。例如,通信模块810可以包括有线网络接口,广播接收模块、移动通信模块、无线因特网模块、局域通信模块和位置(或定位)信息模块等其中的一个或多个。这多种模块均在现有技术中有多种实现,本申请不一一描述。
传感器820可以感测系统的当前状态,诸如打开/闭合状态、位置、与用户是否有接触、方向、和加速/减速,并且传感器820可以生成用于控制系统的操作的感测信号。
用户输入模块830,用于接收输入的数字信息、字符信息或接触式触摸操作/非接触式手势,以及接收与系统的用户设置以及功能控制有关的信号输入等。用户输入模块830包括触控面板和/或其他输入设备。
输出模块840包括显示面板,用于显示由用户输入的信息、提供给用户的信息或系统的各种菜单界面等。可选的,可以采用液晶显示器(liquid crystal display,LCD)或有机发光二极管(organic light-emitting diode,OLED)等形式来配置显示面板。在其他一些实施例中,触控面板可覆盖显示面板上,形成触摸显示屏。另外,输出模块840还可以包括音频输出模块、告警器以及触觉模块等。
音视频输入模块860,用于输入音频信号或视频信号。音视频输入模块860可以 包括摄像头和麦克风。
电源880可以在处理器850的控制下接收外部电力和内部电力,并且提供系统的各个组件的操作所需的电力。
处理器850包括一个或多个处理器,例如,处理器850可以包括一个中央处理器和一个图形处理器。中央处理器在本申请中具有多个核,属于多核处理器。这多个核可以集成在同一块芯片上,也可以各自为独立的芯片。
存储器870存储计算机程序,该计算机程序包括操作系统程序872和应用程序871等。典型的操作系统如微软公司的Windows,苹果公司的MacOS等用于台式机或笔记本的系统,又如谷歌公司开发的基于
Figure PCTCN2019086133-appb-000018
的安卓
Figure PCTCN2019086133-appb-000019
系统等用于移动终端的系统。前述实施例提供的方法可以通过软件的方式实现,可以认为是操作系统程序872的具体实现。
存储器870可以是以下类型中的一种或多种:闪速(flash)存储器、硬盘类型存储器、微型多媒体卡型存储器、卡式存储器(例如SD或XD存储器)、随机存取存储器(random access memory,RAM)、静态随机存取存储器(static RAM,SRAM)、只读存储器(read only memory,ROM)、电可擦除可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、可编程只读存储器(programmable ROM,PROM)、回滚保护存储块(replay protected memory block,RPMB)、磁存储器、磁盘或光盘。在其他一些实施例中,存储器870也可以是因特网上的网络存储设备,系统可以对在因特网上的存储器870执行更新或读取等操作。
处理器850用于读取存储器870中的计算机程序,然后执行计算机程序定义的方法,例如处理器850读取操作系统程序872从而在该系统运行操作系统以及实现操作系统的各种功能,或读取一种或多种应用程序871,从而在该系统上运行应用。
存储器870还存储有除计算机程序之外的其他数据873。
NPU 890作为协处理器挂载到主处理器850上,用于执行主处理器850给它分配的任务。在本实施例中,NPU890可以被人脸识别TA的一个或多个子线程调用从而实现人脸识别中涉及的部分复杂算法。具体的,人脸识别TA的子线程在主处理器850的多个核上运行,然后主处理器850调用NPU890,NPU890实现的结果再返回给主处理器850。
以上各个模块的连接关系仅为一种示例,本申请任意实施例提供的方法也可以应用在其它连接方式的终端设备中,例如所有模块通过总线连接。
图9是本实施例提供的一种NPU900的结构示意图。NPU900与主处理器和外部存储器相连。NPU900的核心部分为运算电路903,通过控制器904控制运算电路903提取存储器中的数据并进行数学运算。
在一些实现中,运算电路903内部包括多个处理引擎(process engine,PE)。在一些实现中,运算电路903是二维脉动阵列。运算电路903还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在另一些实现中,运算电路903是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路903从权重存储器902中取矩阵B相应的数据,并缓存在运算电路903的每一个PE上。运算电路903从输入存储器901中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最 终结果,保存在累加器(accumulator)908中。
统一存储器906用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器905(例如direct memory access controller,DMAC)被搬运到权重存储器902中。输入数据也通过存储单元访问控制器905被搬运到统一存储器906中。
总线接口单元910(bus interface unit,BIU)用于AXI(advanced extensible interface)总线与存储单元访问控制器905和取指存储器909(instruction fetch buffer)的交互。
总线接口单元910用于取指存储器909从外部存储器获取指令,还用于存储单元访问控制器905从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
存储单元访问控制器905主要用于将外部存储器中的输入数据搬运到统一存储器906或将权重数据搬运到权重存储器902中或将输入数据数据搬运到输入存储器901中。
向量计算单元907通常包括多个运算处理单元,在需要的情况下,对运算电路903的输出做进一步处理,如向量乘、向量加、指数运算、对数运算、和/或大小比较等等。
在一些实现中,向量计算单元907能将经处理的向量存储到统一存储器906中。例如,向量计算单元907可以将非线性函数应用到运算电路903的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元907生成归一化的值、合并值,或二者均有。在一些实现中,经处理的向量能够用作运算电路903的激活输入。
与控制器904连接的取指存储器909用于存储控制器904使用的指令。
统一存储器906,输入存储器901,权重存储器902以及取指存储器909均为On-Chip存储器。图中的外部存储器与该NPU硬件架构独立。
需要说明的是,本实施例提供的方法也可以应用于非终端的计算机设备,例如云端服务器。
需要说明的是,以上实施例多以人脸识别方案为例介绍,但本申请提出的方法显然可以应用于除人脸识别之外的其它方案,本领域技术人员根据本申请提供的实现方式容易想到其它方案的类似实现方式。
需要说明的是,前述实施例中提出模块或单元的划分仅作为一种示例性的示出,所描述的各个模块的功能仅是举例说明,本申请并不以此为限。本领域普通技术人员可以根据需求合并其中两个或更多模块的功能,或者将一个模块的功能拆分从而获得更多更细粒度的模块,以及其他变形方式。
以上描述的各个实施例之间相同或相似的部分可相互参考。本申请中的“多个”若无特殊说明,指两个或两个以上,或“至少两个”。本申请中的“A/B”包括三种情况:“A”、“B”和“A和B”。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
以上所述,仅为本申请的一些具体实施方式,但本申请的保护范围并不局限于此。

Claims (27)

  1. 一种计算机系统,所述计算机系统上部署有富执行环境REE和可信执行环境TEE,所述TEE部署有TA,所述TA的功能包括多个子功能,其特征在于,所述TEE还部署有线程创建模块、通知模块、以及TEE调度模块,其中,
    所述线程创建模块用于在所述TA的调用下创建子线程,所述子线程用以实现所述多个子功能中的一个子功能;
    所述通知模块用于触发所述REE生成影子线程,所述影子线程的运行促使运行所述影子线程的核进入所述TEE;
    所述TEE调度模块用于将所述子线程调度到所述核上运行。
  2. 根据权利要求1所述的计算机系统,其特征在于,所述REE还部署有通知处理模块,则:
    所述通知模块具体用于在所述子线程被创建后产生通知,并将所述通知发送到所述通知处理模块;以及
    所述通知处理模块用于根据所述通知创建所述影子线程。
  3. 根据权利要求1或2所述的计算机系统,其特征在于,
    所述TEE调度模块具体用于:当确定所述影子线程为首次进入所述TEE时,将新创建的所述子线程调度到所述运行所述影子线程的核上运行。
  4. 根据权利要求1-3任意一项所述的计算机系统,其特征在于,
    所述TEE调度模块还用于:记录所述影子线程与所述子线程的对应关系。
  5. 根据权利要求4所述的计算机系统,其特征在于,
    所述TEE调度模块具体用于:为所述子线程创建第一线程标识,所述第一线程标识用于指示访问所述子线程的线程;并且在将所述子线程调度到所述核上运行之后将所述第一线程标识的值设置为所述影子线程的标识。
  6. 根据权利要求4或5所述的计算机系统,其特征在于,
    所述TEE调度模块还用于:当确定所述影子线程为再次进入所述TEE时,根据记录的所述影子线程与所述子线程的对应关系,将所述子线程调度到运行所述影子线程的当前核上运行。
  7. 根据权利要求6所述的计算机系统,其特征在于,所述影子线程通过调用安全监视指令SMC以使得运行所述影子线程的核首次进入或再次进入所述TEE,所述SMC指令中包含一个参数,所述参数用于指示所述核是首次进入所述TEE或再次进入所述TEE;
    相应的,所述TEE调度模块用于根据所述参数确定所述影子线程为再次进入所述TEE。
  8. 根据权利要求1-7任意一项所述的计算机系统,其特征在于,
    所述TEE调度模块还用于:记录运行所述影子线程的当前核与所述影子线程的对应关系。
  9. 根据权利要求8所述的计算机系统,其特征在于,
    所述TEE调度模块具体用于:在运行所述影子线程的当前核进入所述TEE后,在所述当前核在全局状态数组中对应的元素处记录所述影子线程的标识,其中,所述全局状态数组包含N个元素,每个元素对应所述计算机系统的一个核;在运行所述影子线程的当前核离开所述TEE后,将所述当前核在所述全局状态数组中对应的元素清空。
  10. 根据权利要求9所述的计算机系统,其特征在于,
    所述TEE调度模块具体用于:在运行所述影子线程的当前核进入所述TEE后,在所述当前核在所述全局状态数组中对应的元素处记录所述影子线程的标识,并查找目标子线程,将所述目标子线程调度到所述当前核上运行,其中,所述目标子线程对应的第一线程标识为所述当前核在所述全局状态数组中对应的元素处所记录的标识。
  11. 根据权利要求1-10任意一项所述的计算机系统,其特征在于,所述通知为软中断。
  12. 根据权利要求1-11任意一项所述的计算机系统,其特征在于,所述TEE还部署有神经网络处理单元NPU驱动;
    所述NPU驱动用于在所述TA的一个或多个子线程的调用下驱动NPU运行。
  13. 根据权利要求1-12任意一项所述的计算机系统,其特征在于,所述TEE还部署有安全存储单元和硬件驱动单元,所述安全存储单元和所述硬件驱动单元仅能被TEE访问;
    所述硬件驱动单元用于在所述TA的一个或多个子线程的调用下访问对应的硬件;
    所述安全存储单元用于存储所述硬件采集的数据。
  14. 根据权利要求1-13任意一项所述的计算机系统,其特征在于,所述TA为实现人脸识别功能的TA或为实现指纹识别功能的TA。
  15. 一种在可信执行环境TEE侧实现多核并行的方法,其特征在于,包括:
    TEE创建子线程,所述子线程用于实现TEE侧部署的TA的一个子功能;
    所述TEE触发富执行环境REE生成影子线程,所述影子线程的运行促使运行所述影子线程的核进入所述TEE;
    所述TEE将创建的所述子线程调度到所述核上执行。
  16. 根据权利要求15所述的方法,其特征在于,还包括:
    所述TEE在所述子线程被创建后产生通知,并将所述通知发送到所述REE,以使得所述REE根据所述通知创建所述影子线程。
  17. 根据权利要求15或16所述的方法,其特征在于,还包括:
    所述TEE记录所述影子线程与所述子线程的对应关系。
  18. 根据权利要求17所述的方法,其特征在于,所述TEE记录所述影子线程与所述子线程的对应关系,包括:
    所述TEE将所述影子线程的标识记录在所述子线程的线程控制块TCB中的第一线程标识中,所述第一线程标识用于指示访问所述子线程的线程。
  19. 根据权利要求17或18所述的方法,其特征在于,还包括:
    当确定所述影子线程再次进入所述TEE时,所述TEE根据记录的所述影子线程与所述子线程的对应关系,将所述子线程调度到运行所述影子线程的当前核上运行。
  20. 根据权利要求15-19任意一项所述的方法,其特征在于,还包括:
    所述TEE记录运行所述影子线程的当前核与所述影子线程的对应关系。
  21. 根据权利要求20所述的方法,其特征在于,所述TEE记录运行所述影子线程的当前核与所述影子线程的对应关系包括:
    在运行所述影子线程的当前核进入所述TEE后,在所述当前核在全局状态数组中对应的元素处记录所述影子线程的标识,其中,所述全局状态数组包含N个元素,每个元素对应所述计算机系统的一个核;
    在运行所述影子线程的当前核离开所述TEE后,将所述当前核在所述全局状态数组中对应的元素处清空。
  22. 根据权利要求15-21任意一项所述的方法,其特征在于,所述通知为软中断。
  23. 根据权利要求15-22任意一项所述的方法,其特征在于,还包括:
    所述TEE通过调用部署在TEE的神经网络处理单元NPU驱动实现对NPU的调用。
  24. 根据权利要求15-23任意一项所述的方法,其特征在于,还包括:
    所述TEE通过部署在TEE侧的硬件驱动单元访问对应的硬件,并将硬件采集的数据存储在部署在TEE侧的安全存储单元中。
  25. 根据权利要求15-24任意一项所述的方法,其特征在于,所述TA为实现人脸识别功能的TA或为实现指纹识别功能的TA。
  26. 一种计算机系统,其特征在于,包括存储器和处理器,其中,
    所述存储器用于存储计算机可读指令;所述处理器用于读取所述计算机可读指令并实现如权利要求15-25任意一项所述的方法。
  27. 一种计算机存储介质,其特征在于,存储有计算机可读指令,且所述计算机可读指令在被处理器执行时实现如权利要求15-25任意一项所述的方法。
PCT/CN2019/086133 2018-06-19 2019-05-09 在tee侧实现多核并行的方法、装置及系统 WO2019242423A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
KR1020207037763A KR102509384B1 (ko) 2018-06-19 2019-05-09 Tee 측에 대해 병렬인 멀티-코어를 구현하기 위한 방법, 장치 및 시스템
EP19823478.3A EP3812903A4 (en) 2018-06-19 2019-05-09 PROCESS, APPARATUS AND SYSTEM FOR IMPLEMENTING A MULTI-CORE PARALLELISM ON ONE SIDE OF THE TEE
AU2019291421A AU2019291421A1 (en) 2018-06-19 2019-05-09 Method and apparatus for implementing multiprocessing on tee, and system
CA3103584A CA3103584C (en) 2018-06-19 2019-05-09 Method and apparatus for implementing multiprocessing on tee, and system
US17/126,873 US11461146B2 (en) 2018-06-19 2020-12-18 Scheduling sub-thread on a core running a trusted execution environment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810632168.2 2018-06-19
CN201810632168.2A CN109960582B (zh) 2018-06-19 2018-06-19 在tee侧实现多核并行的方法、装置及系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/126,873 Continuation US11461146B2 (en) 2018-06-19 2020-12-18 Scheduling sub-thread on a core running a trusted execution environment

Publications (1)

Publication Number Publication Date
WO2019242423A1 true WO2019242423A1 (zh) 2019-12-26

Family

ID=67023118

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/086133 WO2019242423A1 (zh) 2018-06-19 2019-05-09 在tee侧实现多核并行的方法、装置及系统

Country Status (7)

Country Link
US (1) US11461146B2 (zh)
EP (1) EP3812903A4 (zh)
KR (1) KR102509384B1 (zh)
CN (1) CN109960582B (zh)
AU (1) AU2019291421A1 (zh)
CA (1) CA3103584C (zh)
WO (1) WO2019242423A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114091653A (zh) * 2021-11-06 2022-02-25 支付宝(杭州)信息技术有限公司 模型的运行方法和装置
CN115016666A (zh) * 2021-11-18 2022-09-06 荣耀终端有限公司 触控处理方法、终端设备以及存储介质

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109040088B (zh) * 2018-08-16 2022-02-25 腾讯科技(深圳)有限公司 认证信息传输方法、密钥管理客户端及计算机设备
GB2579682B (en) * 2019-03-25 2021-03-24 Trustonic Ltd Trusted execution environment migration method
GB2586640B (en) * 2019-08-30 2021-12-08 Trustonic Ltd Trusted execution environment scheduling method
CN110795385B (zh) * 2019-10-29 2023-11-03 飞腾信息技术有限公司 片上系统的可信核与计算核核资源分配方法及装置
CN113192237B (zh) * 2020-01-10 2023-04-18 阿里巴巴集团控股有限公司 支持tee和ree的物联网设备以及实现tee和ree间通信的方法
CN111353162B (zh) * 2020-03-26 2022-06-07 中国人民解放军国防科技大学 基于TrustZone分核异步执行的主动可信计算方法及系统
CN113626818B (zh) * 2020-05-08 2023-10-20 华为技术有限公司 计算机系统、服务处理方法、可读存储介质及芯片
CN112817713B (zh) * 2021-01-27 2023-10-13 广州虎牙科技有限公司 作业调度方法、装置和电子设备
CN112818327A (zh) * 2021-02-26 2021-05-18 中国人民解放军国防科技大学 基于TrustZone的用户级代码和数据安全可信保护方法及装置
US20220374513A1 (en) * 2021-05-21 2022-11-24 Samsung Electronics Co., Ltd. Apparatus and method for providing secure execution environment for npu
CN113760090B (zh) * 2021-06-18 2022-09-13 荣耀终端有限公司 一种基于可信执行环境的业务流程执行方法及电子设备
CN115509677A (zh) * 2021-06-23 2022-12-23 华为技术有限公司 一种虚拟机与安全隔离区间的通信方法及相关装置
CN113419919A (zh) * 2021-06-24 2021-09-21 亿览在线网络技术(北京)有限公司 一种对第三方sdk进行线程监控的方法
CN113486355B (zh) * 2021-06-29 2023-03-14 北京紫光展锐通信技术有限公司 一种信息保存装置、方法、通信装置、芯片及其模组设备
CN113627328A (zh) * 2021-08-10 2021-11-09 安谋科技(中国)有限公司 电子设备及其图像识别方法、片上系统和介质
CN114372260B (zh) * 2022-03-22 2022-07-22 荣耀终端有限公司 一种多线程处理方法及电子设备
CN114372261B (zh) * 2022-03-22 2022-07-22 荣耀终端有限公司 一种多线程处理方法及电子设备
CN115391066B (zh) * 2022-08-31 2023-06-30 瀚博半导体(上海)有限公司 用于芯片的数据交互方法、装置和人工智能芯片
CN116566744B (zh) * 2023-07-07 2023-09-22 北京瑞莱智慧科技有限公司 数据处理方法和安全校验系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064742A (zh) * 2012-12-25 2013-04-24 中国科学院深圳先进技术研究院 一种hadoop集群的自动部署系统及方法
CN106548077A (zh) * 2016-10-19 2017-03-29 沈阳微可信科技有限公司 通信系统和电子设备
US20180101688A1 (en) * 2016-10-11 2018-04-12 Intel Corporation Trust-enhanced attribute-based encryption

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7788669B2 (en) * 2003-05-02 2010-08-31 Microsoft Corporation System for isolating first computing environment from second execution environment while sharing resources by copying data from first portion to second portion of memory
US7197745B2 (en) * 2003-05-02 2007-03-27 Microsoft Corporation User debugger for use on processes running in a high assurance kernel in an operating system
US9292712B2 (en) * 2012-09-28 2016-03-22 St-Ericsson Sa Method and apparatus for maintaining secure time
US9742559B2 (en) * 2013-01-22 2017-08-22 Qualcomm Incorporated Inter-module authentication for securing application execution integrity within a computing device
EP2759955A1 (en) * 2013-01-28 2014-07-30 ST-Ericsson SA Secure backup and restore of protected storage
US8935746B2 (en) * 2013-04-22 2015-01-13 Oracle International Corporation System with a trusted execution environment component executed on a secure element
CN104216777B (zh) * 2014-08-29 2017-09-08 宇龙计算机通信科技(深圳)有限公司 双系统电子装置及终端
WO2018000370A1 (zh) * 2016-06-30 2018-01-04 华为技术有限公司 一种移动终端的认证方法及移动终端
US10402566B2 (en) * 2016-08-01 2019-09-03 The Aerospace Corporation High assurance configuration security processor (HACSP) for computing devices
CN106547618B (zh) 2016-10-19 2019-10-29 沈阳微可信科技有限公司 通信系统和电子设备
KR20180044173A (ko) * 2016-10-21 2018-05-02 삼성전자주식회사 시큐어 엘리먼트, 시큐어 엘리먼트의 동작 방법 및 시큐어 엘리먼트를 포함하는 전자 장치
CN106844082A (zh) * 2017-01-18 2017-06-13 联想(北京)有限公司 处理器预测故障分析方法及装置
CN109670312A (zh) * 2017-10-13 2019-04-23 华为技术有限公司 安全控制方法及计算机系统
CN109729523B (zh) * 2017-10-31 2021-02-23 华为技术有限公司 一种终端联网认证的方法和装置
CN108399329B (zh) * 2018-01-23 2022-01-21 晶晨半导体(上海)股份有限公司 一种提高可信应用程序安全的方法
US20210034763A1 (en) * 2018-01-31 2021-02-04 Huawei Technologies Co., Ltd. Splitting Sensitive Data and Storing Split Sensitive Data in Different Application Environments
EP3633546A4 (en) * 2018-04-12 2020-10-21 Guangdong Oppo Mobile Telecommunications Corp., Ltd. IMAGE PROCESSING METHOD AND DEVICE, ELECTRONIC DEVICE AND COMPUTER READABLE STORAGE MEDIUM
CN109766152B (zh) * 2018-11-01 2022-07-12 华为终端有限公司 一种交互方法及装置
CN111723383B (zh) * 2019-03-22 2024-03-19 阿里巴巴集团控股有限公司 数据存储、验证方法及装置
GB2586640B (en) * 2019-08-30 2021-12-08 Trustonic Ltd Trusted execution environment scheduling method
KR20220006890A (ko) * 2020-07-09 2022-01-18 삼성전자주식회사 모바일 결제를 지원하는 전자 장치, 그 동작 방법 및 저장 매체
CN112101949B (zh) * 2020-09-18 2022-12-16 支付宝(杭州)信息技术有限公司 安全的服务请求处理方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064742A (zh) * 2012-12-25 2013-04-24 中国科学院深圳先进技术研究院 一种hadoop集群的自动部署系统及方法
US20180101688A1 (en) * 2016-10-11 2018-04-12 Intel Corporation Trust-enhanced attribute-based encryption
CN106548077A (zh) * 2016-10-19 2017-03-29 沈阳微可信科技有限公司 通信系统和电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3812903A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114091653A (zh) * 2021-11-06 2022-02-25 支付宝(杭州)信息技术有限公司 模型的运行方法和装置
CN115016666A (zh) * 2021-11-18 2022-09-06 荣耀终端有限公司 触控处理方法、终端设备以及存储介质
CN115016666B (zh) * 2021-11-18 2023-08-25 荣耀终端有限公司 触控处理方法、终端设备以及存储介质

Also Published As

Publication number Publication date
CN109960582B (zh) 2020-04-28
CA3103584A1 (en) 2019-12-26
US11461146B2 (en) 2022-10-04
US20210103470A1 (en) 2021-04-08
EP3812903A4 (en) 2021-07-07
AU2019291421A1 (en) 2021-01-07
KR20210014686A (ko) 2021-02-09
KR102509384B1 (ko) 2023-03-14
CA3103584C (en) 2022-10-04
CN109960582A (zh) 2019-07-02
EP3812903A1 (en) 2021-04-28

Similar Documents

Publication Publication Date Title
WO2019242423A1 (zh) 在tee侧实现多核并行的方法、装置及系统
US11687645B2 (en) Security control method and computer system
Kato et al. RGEM: A responsive GPGPU execution model for runtime engines
US8375221B1 (en) Firmware-based trusted platform module for arm processor architectures and trustzone security extensions
US20170103382A1 (en) Method of providing payment service and electronic device for implementing same
US11042398B2 (en) System and method for guest operating system using containers
CN114035842B (zh) 固件配置方法、计算系统配置方法、计算装置以及设备
EP3876095A1 (en) Framework-agnostic agile container launches through lateral reuse of capabilities in standard runtimes
US8635682B2 (en) Propagating security identity information to components of a composite application
CN113139175A (zh) 处理单元、电子设备以及安全控制方法
KR102297383B1 (ko) 보안 데이터 처리
CN116049813B (zh) 基于可信执行环境的触屏数据处理方法、设备及存储介质
CN108984259A (zh) 界面显示方法、装置及终端
WO2021088744A1 (zh) 能力的管理方法和计算机设备
Li et al. Teep: Supporting secure parallel processing in arm trustzone
Xiao et al. TrustZone-based mobile terminal security system
US10121001B1 (en) System and method for monolithic scheduling in a portable computing device using a hypervisor
WO2024040508A1 (en) Memory preserved warm reset mechanism
CN111989693A (zh) 生物识别方法及装置
US11770414B2 (en) Information handling systems and methods to provide a secure platform control point for cloud-native applications
US20240020367A1 (en) Method for Performing Biometric Feature Authentication When Multiple Application Interfaces are Simultaneously Displayed
US12008111B2 (en) System and method for efficient secured startup of data processing systems
US20240037239A1 (en) System and method for efficient secured startup of data processing systems
US20240037237A1 (en) System and method for flexible startup of data processing systems
WO2023173896A1 (zh) 通信方法、电子设备及可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19823478

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 3103584

Country of ref document: CA

ENP Entry into the national phase

Ref document number: 2019823478

Country of ref document: EP

Effective date: 20201210

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20207037763

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2019291421

Country of ref document: AU

Date of ref document: 20190509

Kind code of ref document: A