CN111415291B - Multi-core chip and scheduling method thereof - Google Patents

Multi-core chip and scheduling method thereof Download PDF

Info

Publication number
CN111415291B
CN111415291B CN202010108531.8A CN202010108531A CN111415291B CN 111415291 B CN111415291 B CN 111415291B CN 202010108531 A CN202010108531 A CN 202010108531A CN 111415291 B CN111415291 B CN 111415291B
Authority
CN
China
Prior art keywords
request
processor
chip
information
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010108531.8A
Other languages
Chinese (zh)
Other versions
CN111415291A (en
Inventor
周一兰
吴庆丰
杨帆
吴志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202010108531.8A priority Critical patent/CN111415291B/en
Publication of CN111415291A publication Critical patent/CN111415291A/en
Priority to PCT/CN2021/075196 priority patent/WO2021164560A1/en
Application granted granted Critical
Publication of CN111415291B publication Critical patent/CN111415291B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer And Data Communications (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention provides a multi-core chip and a method and a system for service processing. The multi-core chip configures a controller for each processor, so that the multi-core chip can cache requests and merge a plurality of requests. In addition, the server carries indexes in the request processing flow, maintains a data index table according to the indexes, and selects a processor for processing the request according to the data index table. The invention can realize the load balance of the processors, can determine the states of a plurality of processors included in the multi-core chip, and adaptively adjust the data in the data index table according to the state of each processor, thereby improving the processing efficiency of the multi-core chip.

Description

Multi-core chip and scheduling method thereof
Technical Field
The present application relates to the field of communications, and in particular, to a multi-core chip and a scheduling method thereof.
Background
At present, a large amount of image processing or voice processing is required for services on various terminals such as games, high-definition video playing, image editing and the like. With the development of terminal technology, many electronic terminals are equipped with processors, such as Graphics Processing Units (GPUs), dedicated to processing the above services, for supporting image processing services. However, the processing power of the electronic terminal cannot meet the user's demand with respect to the power of the server. Therefore, many services require the electronic terminal to call the server, or the electronic terminal calls the server, and the server processes data by calling the chip.
However, the scheduling capability of the existing multi-core chip is weak. The existing scheduling mode cannot fully utilize the processing capacity of a chip and cannot meet the concurrent scene, so that the processing efficiency is low.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide an apparatus, a service scheduling method and a service scheduling system, so as to improve the efficiency of service processing.
The above and other objects are achieved by the features of the independent claims. Further implementations are presented in the dependent claims, the description and the drawings.
In a first aspect, embodiments of the present application provide an apparatus comprising, a first processor and a second processor; the system comprises a first controller and a second controller, wherein the first controller is connected with a first processor, and the second controller is connected with a second processor; the first controller is configured to: storing a first request and a second request, the first request and the second request being requests assigned to a first processor; sending the first request to a first processor for processing; receiving a first processed message from a first processor; sending the second request to the first processor for processing; the second controller is configured to: storing a third request and a fourth request, the third request and the fourth request being requests assigned to a second processor; sending the third request to the second processor for processing; receiving a second processed message from the second processor; and sending the fourth request to the second processor for processing.
Through the implementation mode, the device can reduce the idle time of the processor and improve the service processing efficiency of the processor.
In a possible implementation manner, the first controller stores the fifth request and the sixth request, and is further configured to combine the fifth request and the sixth request into a seventh request, and send the seventh request to the first processor for processing.
Through the implementation mode, the processor can combine a plurality of requests into one for processing, and the service processing efficiency of the processor is further improved.
In one possible implementation manner, the first controller receives a first message from the first device, where the first message includes a first request and first information; and the first information corresponds to the first processor.
With the above implementation, the allocation of the request to the processor is performed according to the first information.
In a possible implementation manner, the first processing message includes a first processing result, and the first processing result is a result of processing the first request by the first processor; the first controller sends a second message to the first device, wherein the second message comprises the first processing result and the first information.
Through the implementation manner, the first device can match the request with the processing result according to the first information.
In a possible implementation manner, the first processed message is a third message, the third message includes the first processing result and the first information, and the first controller sends the third message to the first apparatus.
Through the implementation mode, the controller only needs to forward the processing result message.
In a possible implementation manner, when the first controller sends the first request to the first processor, the first controller sends the first information together; the first processing message is a first processing end message indicating that the first processor ends processing of the first request; the first processor sends a fourth message to the first device, wherein the fourth message comprises the first processing result and the first information.
Through the implementation manner, the processor can directly return the processing result to the first device when the processing is finished.
In a possible implementation manner, the apparatus further includes a third controller, and the third controller is respectively connected to the first controller and the second controller; the third controller is configured to receive the first request, the second request, the third request, and the fourth request from the first device, and to send the first request and the second request to the first controller and to send the third request and the fourth request to the second controller.
Through the implementation mode, the device can distribute the request, simplifies the connection structure between the first device and the device, and reduces the burden of the first device.
In one possible implementation manner, the third controller receives a fifth message from the first device, where the fifth message includes the first request and the first information, and the first information corresponds to the first processor; the third controller sends the first request and the first information to the first controller according to the first information.
Through the implementation mode, the device can perform request distribution according to the instruction of the first device, so that the first device can effectively manage each processor.
In a possible implementation manner, the first processing message includes a first processing result; the third controller is further configured to receive the first processing result and the first information from the first controller, and the third controller sends a sixth message to the first device, where the sixth message includes the first processing result and the first information.
Through the implementation mode, the first device can uniformly receive the processing result from the third controller, and the connection structure between the first device and the device is simplified.
In a possible implementation manner, the first processing message includes a first processing result; the third controller is further configured to receive a seventh message from the first controller, the seventh message including the first processing result and the first information, the third controller sending the seventh message to the electronic device.
Through the implementation mode, the controller generates the processing result message.
In a possible implementation manner, the first processing message is an eighth message, and the eighth message includes the first processing result and the first information; the third controller is further configured to receive an eighth message from the first controller, the third controller transmitting the eighth message to the first apparatus.
Through the implementation mode, the processor generates the processing result message.
In a possible implementation manner, the first information includes a first processor sequence number, a timestamp, and a thread number; the first processor sequence number is assigned to the first request by the first device, and the timestamp and the thread number correspond to the first request.
Through the implementation mode, the first device can effectively manage the distribution and processing process of the request.
It is understood that the above-described apparatus may be a multi-core chip; the first processor and the second processor may be chip processors; the first device may be a server or a processor, i.e., a CPU, in the server; the first message and the fifth message may be processing request messages; the second message, the third message, the fourth message, the sixth message, the seventh message, and the eighth message may be processing result messages.
In a second aspect, an embodiment of the present application provides a service processing method, which includes: a first device sends a first message to a second device, wherein the second device comprises a first processor and a second processor; the first message comprises a first request and first information, and the first information corresponds to the first processor; the second device sends the first request to the first processor for processing according to the first information; the second device sends a second message to the first device, wherein the second message comprises a first processing result and first information, and the first processing result is a processing result after the first processor processes the first request; the first information is generated after the first device receives the first request and determines a processor for processing the first request; the first information comprises a first processor serial number, a timestamp and a thread number; the first processor sequence number is assigned to the first request by the first device, and the timestamp and the thread number correspond to the first request. The first device may specifically be a server. The second device may be a multi-core chip.
Through the implementation manner, the second device can allocate the processors according to the first information sent by the first device.
In one possible implementation, the first apparatus stores a first table, the first table corresponding to the second apparatus, wherein the first table includes a first array and a second array, the first array corresponding to the first processor, the second array corresponding to the second processor; the first array comprises first data and second data, the first data comprises a first record, the first record is used for storing one or more pieces of information, and the quantity of the information is a first quantity; the second data comprises a second record, and the second record is used for storing one or more pieces of information, and the quantity of the information is a second quantity; the second number group comprises third data and fourth data, the third data comprises a third record, the third record is used for storing one or more pieces of information, and the number of the information is a third number; the fourth data comprises a fourth record, and the fourth record is used for storing one or more pieces of information, and the number of the pieces of information is a fourth number; after the first device generates first information for the first request, the first information is stored in a first record; the first data and the second data exchange data with each other at regular intervals; after receiving the second message from the second device, the first device queries the first record and/or the second record according to the first information in the second message; if the first information is inquired in the first record, removing the first information from the first record; if the first information is queried in the second record, the first information is removed from the second record.
Through the implementation manner, the first device can obtain the condition that each processor of the second device processes the request.
In a possible implementation manner, if the first device detects that the processing result is not received from the first processor within the first duration, the first device generates a first test request and information corresponding to the first test request, where the information corresponding to the first test request corresponds to the first processor; the first device detects whether a processing result is received from the first processor within a third time length, and clears a second record of the second data before the first data and the second data exchange data next time if the processing result is received; if the processing result is not received, before the next exchange of the first data and the second data, one or more pieces of information in a second record of the second data are stored in the first record of the first data, and the second record is emptied.
Through the implementation mode, the first device can acquire the change of the state of each processor of the second device and adaptively adjust and maintain the numerical value, and the processing efficiency can be improved.
In a possible implementation manner, before the first device generates the first test request, the first device further includes that the first device detects whether the first processor receives the request within the second time duration, and generates the first test request if the server determines that the first processor does not receive the request within the second time duration.
Through the implementation manner, the first device can perform flow compensation after determining that the exception handler does not receive the request.
In one possible implementation, if the first device determines that the processing result is not received from the first processor within the third duration, the first device generates a second test request after the first data and the second data exchange data.
With the implementation manner, the first device can continue to detect the state change of the processor after determining that the processor is in the abnormal state.
In a possible implementation manner, the determining, by the first device, a processor for processing the first request according to the first table specifically includes: determining a first capability and a second capability according to the sum of the first number and the second number and the sum of the third number and the fourth number, wherein the first capability represents the capability of the first processor for processing the new request, and the second capability represents the capability of the second processor for processing the new request; determining a first distribution probability and a second distribution probability according to the first capability and the second capability, and determining a first probability space and a second probability space; the first distribution probability and the first probability space correspond to a first processor, the second distribution probability and the second probability space correspond to a second processor; and taking a random number, and determining an allocated processor according to the random number.
Through the implementation mode, the first device can allocate the processors according to the condition that the processors process the requests, so that load balancing can be realized, and the processing efficiency is improved.
In one possible implementation, the first device determines the first capability and the second capability according to a sum of the first number and the second number, a sum of the third number and the fourth number, and a processing speed of the first processor and the second processor.
Through the implementation mode, the first device can more accurately determine the new request processing capacity of each processor.
In one possible implementation, the first apparatus stores a first table, where the first table includes a fifth array and a sixth array, the fifth array corresponds to the first processor, and the sixth data corresponds to the second processor; the fifth array comprises fifth records, the fifth records are used for storing one or more pieces of information, and the number of the pieces of information is the fifth number; the sixth array comprises sixth records, the sixth records are used for storing one or more pieces of information, and the number of the pieces of information is the sixth number; after the first device generates first information for the first request, the first information is stored in a fifth record; after receiving the second message from the second device, the first device queries the fifth record according to the first information in the second message; and if the first information is inquired in the fifth record, removing the first information from the fifth record.
Through the implementation manner, the first device can obtain the condition that each processor of the second device processes the request.
In a possible implementation manner, if the first device detects that the processing result is not received from the first processor within the fourth duration, the first device generates a third test request and information corresponding to the third test request, where the information corresponding to the third test request corresponds to the first processor; the first device detects whether a processing result is received from the first processor within a fifth time period, and clears a fifth record if the processing result is received.
Through the implementation mode, the first device can acquire the change of the state of each processor of the second device and adaptively adjust and maintain the numerical value, and the processing efficiency can be improved.
In a possible implementation manner, the determining, by the first device, a processor for processing the first request according to the first table specifically includes: determining a third capability and a fourth capability according to the fifth number and the sixth number, wherein the third capability represents the capability of the first processor for processing the new request, and the fourth capability represents the capability of the second processor for processing the new request; determining a third distribution probability and a fourth distribution probability according to the third capability and the fourth capability, and determining a third probability space and a fourth probability space; the third distribution probability and the third probability space correspond to the first processor, and the fourth distribution probability and the fourth probability space correspond to the second processor; and taking a random number, and determining an allocated processor according to the random number.
Through the implementation mode, the first device can allocate the processors according to the condition that the processors process the requests, so that load balancing can be realized, and the processing efficiency is improved.
In one possible implementation, the first device determines the third capability and the fourth capability according to the fifth number, the sixth number, and processing speeds of the first processor and the second processor.
Through the implementation mode, the first device can more accurately determine the new request processing capacity of each processor.
It is understood that the first table may be a data index table; the first data and the third data may be new data units; the second data and the fourth data may be old data units; the first information may be an index; the first record, the second record, the third record, the fourth record, the fifth record and the sixth record may be index records; the first number, the second number, the third number, the fourth number, the fifth number and the sixth number may be index numbers; the first capability, the second capability, the third capability, and the fourth capability may be new request processing capabilities.
In a third aspect, the present application provides a computer-readable medium including instructions for storing one or more programs, where the one or more programs are configured to be executed by the one or more processors, and the one or more programs include instructions for executing any one of the possible implementations of the second aspect.
In a fourth aspect, an embodiment of the present application provides a service processing system, which includes: a first device; a second device comprising a first processor and a second processor; the method comprises the steps that a first device sends a first message to a second device, wherein the first message comprises a first request and first information, and the first information corresponds to a first processor; the second device sends the first request to the first processor for processing according to the first information; the second device sends a second message to the first device, wherein the second message comprises a first processing result and first information, and the first processing result is a processing result after the first processor processes the first request; the first information is generated after the first device receives the first request and determines a processor for processing the first request; the first information comprises a first processor serial number, a timestamp and a thread number; the first processor sequence number is assigned to the first request by the first device, and the timestamp and the thread number correspond to the first request. The first device may be a server and the second device may be a multi-core chip.
Through the implementation manner, the second device can allocate the processors according to the first information sent by the first device.
In one possible implementation, the first apparatus stores a first table, the first table corresponding to the second apparatus, wherein the first table includes a first array and a second array, the first array corresponding to the first processor, the second array corresponding to the second processor; the first array comprises first data and second data, the first data comprises a first record, the first record is used for storing one or more pieces of information, and the quantity of the information is a first quantity; the second data comprises a second record, and the second record is used for storing one or more pieces of information, and the quantity of the information is a second quantity; the second number group comprises third data and fourth data, the third data comprises a third record, the third record is used for storing one or more pieces of information, and the number of the information is a third number; the fourth data comprises a fourth record, and the fourth record is used for storing one or more pieces of information, and the number of the pieces of information is a fourth number; after the first device generates first information for the first request, the first information is stored in a first record; the first data and the second data exchange data with each other at regular intervals; after receiving the second message from the second device, the first device queries the first record and/or the second record according to the first information in the second message; if the first information is inquired in the first record, removing the first information from the first record; if the first information is queried in the second record, the first information is removed from the second record.
Through the implementation manner, the first device can obtain the condition that each processor of the second device processes the request.
In a possible implementation manner, if the first device detects that the processing result is not received from the first processor within the first duration, the first device generates a first test request and information corresponding to the first test request, where the information corresponding to the first test request corresponds to the first processor; the first device detects whether a processing result is received from the first processor within a third time length, and clears a second record of the second data before the first data and the second data exchange data next time if the processing result is received; if the processing result is not received, before the next exchange of the first data and the second data, one or more pieces of information in a second record of the second data are stored in the first record of the first data, and the second record is emptied.
Through the implementation mode, the first device can acquire the change of the state of each processor of the second device and adaptively adjust and maintain the numerical value, and the processing efficiency can be improved.
In a possible implementation manner, before the first device generates the first test request, the first device further includes that the first device detects whether the first processor receives the request within the second time duration, and generates the first test request if the server determines that the first processor does not receive the request within the second time duration.
Through the implementation manner, the first device can perform flow compensation after determining that the exception handler does not receive the request.
In one possible implementation, if the first device determines that the processing result is not received from the first processor within the third duration, the first device generates a second test request after the first data and the second data exchange data.
With the implementation manner, the first device can continue to detect the state change of the processor after determining that the processor is in the abnormal state.
In a possible implementation manner, the determining, by the first device, a processor for processing the first request according to the first table specifically includes: determining a first capability and a second capability according to the sum of the first number and the second number and the sum of the third number and the fourth number, wherein the first capability represents the capability of the first processor for processing the new request, and the second capability represents the capability of the second processor for processing the new request; determining a first distribution probability and a second distribution probability according to the first capability and the second capability, and determining a first probability space and a second probability space; the first distribution probability and the first probability space correspond to a first processor, the second distribution probability and the second probability space correspond to a second processor; and taking a random number, and determining an allocated processor according to the random number.
Through the implementation mode, the first device can allocate the processors according to the condition that the processors process the requests, so that load balancing can be realized, and the processing efficiency is improved.
In one possible implementation, the first device determines the first capability and the second capability according to a sum of the first number and the second number, a sum of the third number and the fourth number, and a processing speed of the first processor and the second processor.
Through the implementation mode, the first device can more accurately determine the new request processing capacity of each processor.
In one possible implementation, the first apparatus stores a first table, where the first table includes a fifth array and a sixth array, the fifth array corresponds to the first processor, and the sixth data corresponds to the second processor; the fifth array comprises fifth records, the fifth records are used for storing one or more pieces of information, and the number of the pieces of information is the fifth number; the sixth array comprises sixth records, the sixth records are used for storing one or more pieces of information, and the number of the pieces of information is the sixth number; after the first device generates first information for the first request, the first information is stored in a fifth record; after receiving the second message from the second device, the first device queries the fifth record according to the first information in the second message; and if the first information is inquired in the fifth record, removing the first information from the fifth record.
Through the implementation manner, the first device can obtain the condition that each processor of the second device processes the request.
In a possible implementation manner, if the first device detects that the processing result is not received from the first processor within the second duration, the first device generates a third test request and information corresponding to the third test request, where the information corresponding to the third test request corresponds to the first processor; the first device detects whether a processing result is received from the first processor within a third time period, and clears the fifth record if the processing result is received.
Through the implementation mode, the first device can acquire the change of the state of each processor of the second device and adaptively adjust and maintain the numerical value, and the processing efficiency can be improved.
In a possible implementation manner, the determining, by the first device, a processor for processing the first request according to the first table specifically includes: determining a third capability and a fourth capability according to the fifth number and the sixth number, wherein the third capability represents the capability of the first processor for processing the new request, and the fourth capability represents the capability of the second processor for processing the new request; determining a third distribution probability and a fourth distribution probability according to the third capability and the fourth capability, and determining a third probability space and a fourth probability space; the third distribution probability and the third probability space correspond to the first processor, and the fourth distribution probability and the fourth probability space correspond to the second processor; and taking a random number, and determining an allocated processor according to the random number.
Through the implementation mode, the first device can allocate the processors according to the condition that the processors process the requests, so that load balancing can be realized, and the processing efficiency is improved.
In one possible implementation, the first device determines the third capability and the fourth capability according to the fifth number, the sixth number, and processing speeds of the first processor and the second processor.
Through the implementation mode, the first device can more accurately determine the new request processing capacity of each processor.
It should be appreciated that the description of technical features, aspects, advantages, or similar language in the specification does not imply that all of the features and advantages may be realized in any single embodiment. Rather, it is to be understood that the description of a feature or advantage is intended to include the inclusion of a particular feature, aspect or advantage in at least one embodiment. Thus, descriptions of technical features, technical solutions or advantages in this specification do not necessarily refer to the same embodiment. Furthermore, the technical features, aspects and advantages described in the following embodiments may be combined in any suitable manner. One skilled in the relevant art will recognize that an embodiment may be practiced without one or more of the specific features, aspects, or advantages of a particular embodiment. In other instances, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the embodiments of the present application will be described below.
Fig. 1A and fig. 1B are schematic diagrams of an architecture of a service processing system according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a server invoking a multi-core chip according to an embodiment of the present invention;
fig. 4 is a schematic diagram of another server invoking a multi-core chip according to an embodiment of the present invention;
fig. 5 is a schematic diagram illustrating another server invoking a multi-core chip according to an embodiment of the present invention;
FIGS. 6A-6E are schematic structural diagrams of a multi-core chip according to an embodiment of the invention;
FIGS. 7A-7B are schematic diagrams illustrating a processing flow of a multi-core chip according to an embodiment of the invention;
FIG. 8 is a flow chart illustrating a request processing according to an embodiment of the present invention;
FIG. 9 is a flowchart illustrating a chip processor selection method according to an embodiment of the present invention;
fig. 10 is a schematic flowchart of a load maintenance method according to an embodiment of the present invention;
fig. 11 is a schematic flow chart of another load maintenance method according to an embodiment of the present invention;
FIG. 12 is a schematic processing flow diagram of another multi-core chip according to an embodiment of the present invention;
fig. 13 is a schematic processing flow diagram of another multi-core chip according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
Fig. 1A and fig. 1B exemplarily show an architecture of a service processing system according to an embodiment of the present application, where the service processing system is configured to process a service initiated by an electronic device. The services include, but are not limited to, image processing services such as image recognition and image classification, and voice processing services such as voice recognition and voice synthesis. The business processing system mainly comprises one or more electronic devices 100 and a server 101. One or more electronic devices 100 communicate with the server 101 to initiate requests regarding various image processing services and/or voice processing services, which may be HTTP messages. The server may communicate with the electronic device over a local area network or a wide area network. The server 101 may call the multi-core chip 102 through a software/hardware interface to complete a request initiated by the electronic device 100. The hardware interface may include PCI, PCIe, or USB type interfaces, and the software interface may include Application Program Interfaces (APIs), such as an API interface SRC (source) module using software encapsulation and a DST (destination) module, where the SRC module is an application program interface module used for sending data from the server 101 to the chip 102, and the DST module is an application program interface module used for the server 101 to receive data from the chip 102. After the multi-core chip 102 completes processing of the request, the processing result is sent to the server 101, and the server 101 sends the processing result to one or more electronic devices 100.
For example, the process of the electronic device and the server performing image processing and/or voice service may include the following ways:
the user 1 opens an application installed on the mobile phone, such as "gallery", which enables image classification. The mobile phone uploads one or more pictures to the application server, and the application server returns an image classification result to the mobile phone after finishing image classification.
Or
The user 2 opens a browser, accesses a webpage for image recognition, and uploads one or more pictures to the network server by the terminal. And the network server returns an image recognition result to the terminal after finishing the image recognition. Or
The user 3 sends a request to the smart home device (e.g. smart speaker) through natural language, where the request may be to play a song, inquire weather, or customize a reminder. The intelligent household equipment collects voice of a user, sends the collected voice message to the server, the server returns content of a corresponding request to the intelligent household equipment after analyzing the voice message, and the intelligent sound box can play songs requested by the user, broadcast weather conditions or set reminding requested by the user.
The electronic terminal 100 transmits a request to the server 101 and includes image or voice data in the request. In other embodiments, the request may also include a request type. For example, for an image classification processing service initiated by a user, the request may include information indicating the type of the image classification request. After receiving the request, the server 101 calls the multi-core chip 102 to process the request. The multi-core chip 102 transmits the processing result to the server 101 after completing the processing of the request. The server 101 returns the processing result to the electronic terminal 100. Specifically, the server 101 may return the processing result to the electronic terminal 100 in the form of an HTTP message.
The returning of the processing result to the server by the multi-core chip may include the following cases:
and if the multi-core chip finishes the processing, the chip returns a corresponding processing result to the server, and the server returns the processing result to the electronic terminal. For example, if the multi-core chip completes processing for a request for image classification, a processing result is returned to the server, and the processing result may be a specific classification of the image, such as "landscape", "sports", and the like. For an image recognition request, the corresponding processing result may be a specific result of the image recognition, such as whether the face recognition is successful, or names of people, animals and plants, and the like.
And if the multi-core chip fails to process, the chip returns a processing failure message to the server, and the server returns the processing failure message to the electronic terminal.
And if the multi-core chip cannot process the request and cannot return the message to the server, the server returns a processing failure message to the electronic terminal after waiting for overtime.
The electronic device 100 may be a portable electronic device, such as a mobile phone, a tablet computer, a Laptop computer (Laptop), a wearable electronic device (e.g., a smart watch), and the like. In some other embodiments, the electronic device 100 may also be a desktop computer or a vehicle-mounted device.
Fig. 2 shows a schematic structural diagram of the electronic device 100.
The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a key 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identification Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The multi-core chip applied in the embodiment of the invention internally comprises a plurality of chip processors. As shown in FIG. 1, the multi-core chip 102 includes 4 chip processors 1021-. Multiple chip processors may run in parallel, enabling the chip to process multiple tasks simultaneously. In some embodiments, the multi-core chip may be applied to a neural network chip for image/video processing or voice processing, also referred to as an AI chip or an AI accelerator card, and may also be applied to other systems. This is not to be taken in any way limiting by the present application.
The multi-core chip applied in the embodiment of the invention is mainly deployed in a server. The chip can be plugged in the server (for example, through a PCI or PCIe slot, etc.), and can also be used as an external device to be connected with the server (for example, through a PCT, PCIe or USB, etc.).
The embodiment of the invention provides a method for calling a multi-core chip of a multi-core chip by a server by adopting a single thread.
In the present embodiment, as shown in fig. 3, the Host side represents the server side, and the Device side represents the multi-core chip side. And the server calls the multi-core chip to process a request at the same time. After the multi-core chip completes the processing of the request and returns the result to the server, the server calls the multi-core chip to process the next request. In other embodiments, a request may include multiple sub-requests. In this case, the server may configure a chip processor included in a multi-core chip for each sub-request, and after the server sends the request to the multi-core chip, each chip processor in the chip can process the corresponding sub-request at the same time. In order to enable the chip to complete the request at once, the request may contain a number of sub-requests that is less than or equal to the number of chip processors contained in the chip. The chip can uniformly return the processing result to the server after all the chip processors finish the processing of the corresponding sub-requests. In other embodiments, the server may specify the chip processor that completed the request. When the server sends a request to the multi-core chip, the serial number of the appointed chip processor is carried in the request.
The embodiment of the invention provides a method for a server to call a multi-core chip by adopting a plurality of threads.
In this embodiment, when a server receives a new request, the server creates a corresponding thread for the request. For each request, the server allocates one of the chip processors in the multi-core chip to process the request. The number of created threads may be at most the number of chip processors included in a chip, and each thread corresponds to a corresponding chip processor in the chip. For example, if the multi-core chip shown in fig. 4 includes 4 chip processors, the number of threads created in the server is at most 4. Thus, in this scheduling mode, the server can process 4 requests at the same time. If there are already 4 requests being processed simultaneously, the server will wait for one or more of the 4 requests to be processed before receiving a new request. When the server sends a request to the chip, the server can carry the serial number information of the chip processor in the request, and simultaneously, when the chip returns a processing result to the server, the chip can carry the serial number information of the chip processor in the processing result, so that when the server receives the processing result, the server returns to the corresponding thread for processing according to the serial number information of the chip processor.
In the scheduling mode, due to the adoption of multi-thread scheduling, the server can support concurrent operation to a certain extent, the processing capacity of a plurality of chip processors in the multi-core chip is fully utilized, and the processing efficiency of the multi-core chip is improved.
The embodiment of the invention provides a multi-core chip which is provided with a plurality of chip processors. As shown in fig. 5, in the multi-core chip, each chip processor is configured with a corresponding controller, and the controller is configured to cache one or more requests of the corresponding chip processor. Specifically, the controller may include a cache unit therein, in which one or more requests of the corresponding chip processor are cached. Each controller is connected to a corresponding chip processor for the transmission of requests.
In the system shown in fig. 4, one request is sent from the SRC module to the multi-core chip, and is processed in the chip processor of the multi-core chip, and after the DST module receives the processing result sent by the multi-core chip, the next request is sent from the SRC module to the multi-core chip. During this period, the chip processor processing the request is in an idle state both in the time of requesting to send the request from the SRC module to the multicore chip and in the time of returning the processing result from the chip processor to the DST module. In the system with multiple cores as shown in fig. 5, since the controller has a cache capability, when the server sends a request to the chip, it is not necessary to wait for one or more chip processors in the chip to finish processing before sending the next request. After the chip receives the request, the request is cached in a controller corresponding to the chip processor and is processed by the chip processor. After processing a request, the chip processor can immediately obtain the next request from the controller for processing. Therefore, the multi-core chip provided by the embodiment can further improve the processing efficiency of the multi-core chip. In addition, when the server receives a new request, the number of chip processors contained in the multi-core chip is not limited, and the concurrency capability is enhanced.
In some other embodiments, the controller is further configured to, at each processing, obtain a plurality of requests from the cached requests, merge the plurality of requests into one request, and send the merged request to the corresponding chip processor for processing.
Specifically, the input to the request may be a high dimensional array, for example, the high dimensional array may be a 1 x 1024 x 256 array. When the controller performs the merge, the two 1 × 1024 × 256 requests are merged into one 2 × 1024 × 256 request. In other embodiments, the controller may merge three or more requests into one request each time a request merge is made.
In particular, the controller may determine whether to merge multiple requests based on the amount of data requested and/or model matching. For example, if the data size of one request is large, the merging process may not be performed, and correspondingly, if the data size of both requests is small, the two requests may be subjected to the merging process. For another example, if three requests use the same model and the processing procedure is the same, the three requests may be merged. In other embodiments, the controller may merge multiple requests in order of request.
By the embodiment, the controller can combine a plurality of requests into one request to complete processing at one time, and the processing capacity of each chip processor is fully utilized, so that the processing efficiency of the multi-core chip is further improved.
Fig. 6A to 6E exemplarily show the structure of a multi-core chip applied in the present embodiment. The method comprises the following specific steps:
fig. 6A illustrates a structure of a multi-core chip applied in the present embodiment. As shown in fig. 6A, the multi-core chip 102 includes 4 chip processors 1021-. Further, the controllers 1031 and 1034 are connected to the input terminal 104 and the output terminal 105 of the chip 102, respectively, so as to receive requests from the processor 107 in the server 101 and to transmit processing results to the processor 107.
Based on the multi-core chip 102 shown in FIG. 6A, the flow of processing requested in the chip is shown in FIG. 7A:
s201: controller 1031-1034 receives requests from processor 107.
Specifically, the processor 107 determines the chip processor corresponding to each request, and sends the request to the corresponding controller 1031-through 1034 according to the sequence number information of the chip processor corresponding to the request, and the request allocated to the chip processor corresponding to the controller 1031-through 1034 is received by the controller 1031-through 1034 from the processor 107.
S202: controller 1031-1034 caches the one or more requests.
S203: controller 1031-1034 obtains the request from the cached request and sends the request to the chip processor.
Specifically, each time processing is performed, the controller 1031-1034 may obtain one request from the cached requests and send the request to the chip processor for processing, or may obtain multiple requests from the cached requests, combine the multiple requests into one request, and send the combined request to the corresponding chip processor for processing.
S204: the controller 1031 and 1034 receives the processing result from the chip processor, sends the processing result to the processor 107, and returns to step S203.
If the chip processor executes the merged request, the processing results received by the controllers 1031 and 1034 from the chip processor include the processing results corresponding to the merged requests. For example, the request a is to perform image recognition on the picture a, the request B is to perform image recognition on the picture B, and after the controller combines the request a and the request B, the combined request is to perform image recognition on the picture a and the picture B respectively. The processing results after the chip processor processes the merged request are the image recognition result of the picture A and the image recognition result of the picture B.
In other embodiments, when the controller 1031 and 1034 sends the processing result to the processor 107, the serial number information of the chip processor is sent together, or the processing result and the serial number information of the chip processor are packaged and then sent to the processor 107, so that the processor 107 can search a corresponding thread according to the serial number information.
Fig. 6B exemplarily shows another structure of the multi-core chip applied in the present embodiment. Unlike fig. 6A, in fig. 6B, the chip processor 1021-. Correspondingly, based on the multi-core chip shown in fig. 6B, in the processing flow of the chip, the chip processor directly returns the processing result to the processor 107 after completing the processing of the request. In other embodiments, after the chip processor completes the requested processing, it may send a processing end message to its corresponding controller 1031 and 1034, where the processing end message indicates that its processing is ended. After receiving the processing end message, controller 1031 and 1034 sends the next request to the chip processor.
Fig. 6C exemplarily shows another structure of the multi-core chip applied in the present embodiment. As shown in fig. 6C, the multi-core chip includes a controller 106, wherein the controller 106 is configured to assign a request corresponding to each chip processor. The controller 106 is connected to the controllers 1031-1034 of the chip processors, and the controllers 1031-1034 are respectively connected to the corresponding chip processors 1021-1024. Furthermore, a controller 106 is connected to the input 104 and the output 105 of the chip, respectively, for receiving requests from the processor 107 and for sending processing results to the processor 107.
Based on the multi-core chip shown in fig. 6C, the flow of processing requested in the chip is shown in fig. 7B:
s301: the controller 106 receives the request from the processor 107. Wherein, the request carries the serial number information of the chip processor. In some embodiments, the controller 106 may receive information from the processor 107 after encapsulating the request and the chip processor's sequence number information.
S302: the controller 106 sends the request to the controllers 1031 and 1034 of the corresponding chip processors according to the sequence number information of the chip processors. In some embodiments, the controller 106 decapsulates the received encapsulation information to obtain sequence number information of the chip processor.
S303: controller 1031-1034 caches the one or more requests.
S304: and obtaining the request from the requests in the buffers of the controllers 1031 and 1034 and sending the request to the chip processor. For this step, reference may be made to step S203, which is not described herein again.
S305: the controllers 1031 and 1034 receive the processing results from the chip processors and transmit the processing results to the controller 106. In some embodiments, the processing results received by the controllers 1031 and 1034 from the chip processors carry sequence number information of the chip processors, and forward the sequence number information to the controller 106. In other embodiments, the controllers 1031 and 1034 receive the processing results from the chip processors, and the controllers 1031 and 1034 send the processing results and the serial number information of the chip processors to the controller 106.
S306: the controller 106 sends the processing result to the processor 107. The processing result may carry sequence number information of the chip processor, so that the server can search a corresponding thread according to the sequence number information. In other embodiments, the controller 106 sends the packaged processing results and the serial number information of the chip processor to the processor 107. The packaging may be done in the chip processor, controller 1031-1034, or controller 106.
Fig. 6D and 6E exemplarily show another structure of a multi-core chip applied to the present embodiment. In fig. 6D, the controller 1031-1034 is connected to the output terminal 106 of the chip 102, unlike fig. 6C. In fig. 6E, the chip processor 1021-1024 is connected to the output terminal 105 of the chip 102. Accordingly, after the processing of the request is completed, the processing result is returned to the server by the controller 1031-1034 or the chip processor, respectively.
In other embodiments, the controllers 1031-1034 may be included in corresponding chip processors.
In other embodiments, the input 104 and output 105 of FIGS. 6A-6E may be combined into one input/output port.
The embodiment of the invention provides a method for maintaining multi-core chip load. Where the load may include the number of requests that have been sent to the chip processor for processing without returning results. In this embodiment, the server maintains a data index table, which is used to maintain the load condition of each chip processor in the multi-core chip. In addition, the server allocates the corresponding chip processor to the request according to the load condition of each chip processor in the data index table. The structure of the data index table is shown in table 1. The data index table includes a plurality of arrays, the number of arrays corresponding to the number of chip processors included in the multi-core chip. For example, as shown in fig. 1, the multi-core chip 102 includes 4 chip processors 1021-. In some embodiments, the number of chip processors included in the multi-core chip may be preset in the server, or may be configuration information obtained by the server from the multi-core chip. The ith array corresponds to a chip processor 102i in the chip 102, and is used for maintaining the load condition of the chip processor 102 i. Specifically, the ith array may be named with the processor sequence number Core i of the chip processor 102i, which may include an Index Record Index. The Index Record stores the Index corresponding to the request sent to the chip processor 102i, and the server generates a corresponding Index for each request. For example, in the following table, N indices are stored in the Index Record in the array corresponding to Core 1. In some embodiments, the number of indices may be considered to be the load value of the chip processor. The initial state of the Index Record may be null.
In some other embodiments, the ith array may further include an Index number Num _ Index. The number of indexes Num _ Index is the number of indexes held in the Index record. Its initial value may be 0.
It should be understood that table 1 only illustrates the structure of the data index table, and the names of the specific fields, and the present invention is not limited thereto.
TABLE 1
Figure BDA0002389181240000121
After receiving a new request, the server 101 determines the chip processor that processes the request. The index is generated after the server 101 determines the chip processor that processed the request. That is, each request corresponds to an index. The index may include parameters such as processor sequence number Core ID, Thread number Thread ID, and Timestamp. Wherein the processor serial number Core ID is the ID of the chip processor determined by the server 101 to process the request; the Thread number Thread ID is assigned by the operating system for the Thread corresponding to the request, and can be obtained by the std:: this _ Thread:: get _ ID () function when the index is generated. After the server receives the requests, the operating system creates a thread for each request. The thread number is a unique identifier that the operating system identifies the thread. Recording the Thread number Thread ID corresponding to the request in the server enables the server to track the processing of the request. In addition, since the thread number may be reused at different times in the system, the Timestamp is also recorded in the embodiment when the index is generated, and the Timestamp can be obtained by the std:time _ t getTimeStamp () function when the index is generated. The server is able to uniquely identify the request by the Thread number Thread ID and Timestamp information, while the processor number Core ID in the index establishes a mapping of the request to the corresponding chip processor.
In the whole processing flow from the time when the request is sent to the multi-core chip for processing to the time when the multi-core chip finishes processing and returns the processing result to the server, the processing message carries the index, the multi-core chip can distribute the request to the corresponding chip processor based on the index, and the server can match the processing result based on the index, thereby effectively monitoring the processing condition of the request.
Specifically, after the server 101 generates the index, the request and the corresponding index are encapsulated into a processing request message, the encapsulated processing request message is sent to the multicore chip 102, the chip 102 decapsulates the processing request message to obtain the request and the corresponding index, the processor serial number Core ID of the chip processor corresponding to the request is obtained from the index, the chip 102 sends the request to the chip processor corresponding to the Core ID, and the chip processor processes the service corresponding to the request. In some other embodiments, the server 101 sends the encapsulated processing request message directly to the chip processor corresponding to the Core ID in the chip 102 according to the Core ID in the index, and the chip processor processes the request.
After the chip 102 completes the processing of the request, the processing result and the corresponding index are packaged into a processing result message, and the processing result message is sent to the server 101. After receiving the processing result message, the server 101 performs decapsulation to obtain a processing result and a corresponding index. The server 101 can match the corresponding request by the information in the index, and transmit the processing result to the corresponding electronic terminal 100.
In addition, in the present embodiment, in the whole processing flow from the time when the request is sent to the multi-core chip for processing to the time when the multi-core chip finishes processing and returns the processing result to the server, the index is carried in the processing message, and the server can perform load management of the chip according to the index.
Specifically, after the server 101 generates the Index, the Index is stored in the Index Record Index of the chip processor corresponding to the Core ID in the data Index table according to the processor serial number Core ID in the Index, and the corresponding Index number Num _ Index is increased; when the server 101 receives the processing result and the corresponding Index, the server finds an Index matching the Thread number Thread ID and the Timestamp in the Index from the Index Record of the chip processor corresponding to the Core ID according to the processor sequence number Core ID in the Index, removes the Index from the Index Record, and reduces the number Num _ Index of the corresponding Index.
In other embodiments, after the server 101 generates the Index, the Index is stored in the Index Record Index of the chip processor corresponding to the Core ID in the data Index table according to the processor serial number Core ID in the Index; when the server 101 receives the processing result and the corresponding Index, the server finds an Index matching the Thread number Thread ID and the Timestamp in the Index from the Index Record Index of the chip processor corresponding to the Core ID according to the processor sequence number Core ID in the Index, and removes the Index from the Index Record.
The method for calling the multi-core chip by the server and the method for maintaining the multi-core chip load condition by the server are described below with reference to the request processing flow, as shown in fig. 8.
S401: the electronic terminal 100 sends a request to the server 101;
s402: the server 101 receives the request, and selects a chip processor that executes the request according to the Load value Load [ ] of each chip processor. The Index number Num _ Index of the chip processor in the data Index table may be used as the Load value Load [ ] of the chip processor, or the Index number stored in the Index record corresponding to the chip processor in the data Index table may be calculated and used as the Load value Load [ ]ofthe chip processor. The higher the load value of the chip processor is, the more requests it is currently executing is, the weaker the ability of it to process new requests is, and conversely, the lower the load value is, the stronger the ability of it to process new requests is.
In this embodiment, the distribution probability is determined according to the current load value of each chip processor, and the higher the load value is, the lower the corresponding distribution probability is.
Specifically, as shown in fig. 9, the server selecting the chip processor for executing the processing request according to the load value of each chip processor includes the following steps:
s4021: the server 101 determines its new request processing capability according to the load values of the chip processors in the data index table. The higher the load value, the weaker the new request processing capability it corresponds to. In some embodiments, the relationship between the new request handling capability AoE [ i ] of the chip processor 102i and the Load value Load [ i ] may be:
Figure BDA0002389181240000141
where AoE [ i ] represents the new request handling capability of the chip processor 102i and Load [ i ] represents the Load value of the chip processor 102 i. In some embodiments, the number of indices corresponding to the index record of the chip processor 102i in the data index table may be considered to be its Load value Load [ i ].
For example, if the multi-core chip includes 4 chip processors, and the current Load value of each chip processor is Load [4] ═ {1,2,3,2}, then its new request processing capability AoE [4] ═ {1,1/2,1/3,1/2 }. In some embodiments, if the load value of a chip processor is 0, the server may compensate the load value of each chip processor included in the multi-core chip, and obtain the new request processing capability of each chip processor according to the compensated load value. In some embodiments, the compensation is to increment a load value of each chip processor by one. In other embodiments, to avoid the load value being 0, the load value of each chip processor is compensated before determining the new request processing capability of each chip processor.
In some other embodiments, the step S4021 may further include: the server 101 determines the new request processing capability of each chip processor according to the load value and the processing speed of each chip processor. The higher the load value, the lower the processing speed and the weaker the processing capacity of the corresponding new request, while the lower the load value, the higher the processing speed and the stronger the processing capacity of the corresponding new request. In some embodiments, the relationship between the new request handling capability AoE [ i ] of the chip processor 102i and the Load value Load [ i ] and the processing speed SoE [ i ] may be:
Figure BDA0002389181240000142
where AoE [ i ] represents the new request handling capability of the chip processor 102i, Load [ i ] represents the Load value of the chip processor 102i, and SoE [ i ] represents the processing speed of the chip processor 102 i.
The processing speed of each chip processor in the multi-core chip is the attribute information, and may be stored in the server in advance, or may be configuration information acquired from the multi-core chip.
For example, if the multi-core chip includes 4 chip processors, the current Load value of each chip processor is Load [4] ═ {1,2,3,2}, and the processing speed of each chip processor is SoE [4] ═ {1,2,1,1}, then the new request processing capability AoE [4] ═ {1,1,1/3,1/2 }.
S4022: and calculating the distribution probability p [ i ] of each chip processor according to the new request processing capacity AoE [ i ] of each chip processor, and determining the probability space of each chip processor. The distribution probability represents the probability that a new processing request is distributed to the chip processor, and the stronger the new request processing capacity is, the larger the distribution probability corresponding to the chip processor is. In some embodiments, the assignment probability p [ i ] of a chip processor may be:
Figure BDA0002389181240000143
wherein, AoE [ i ]]For the new request handling capability of the chip processor 102i,
Figure BDA0002389181240000144
is the sum of the new request processing capacities of all the chip processors, wherein n is the number of the chip processors contained in the multi-core chip. The sum of all chip processor assignment probabilities is 1.
For example, in S4021, the processing capacities AoE [4] of the 4 chip processors are obtained as {1,1/2,1/3,1/2}, and the assignment probabilities of the chip processors are P [4] = {3/7,3/14,1/7,3/14}, and the probability spaces corresponding to the chip processors are (0,3/7], (3/7,9/14], (9/14, 11/14) ], (11/14, 1), respectively.
S4023: and generating a random number, and determining the distributed chip processors according to the random number. In some embodiments, the step may specifically be: for example, based on the probability spaces corresponding to the 4 chips obtained in step S4022, if the generated random number is 0.5, since 3/7<0.5<9/14, the random number is located in the probability space corresponding to the second chip processor, and thus the new request is assigned to the second chip for processing.
By the selection method, the chip processors can be selected according to the load values of the current chip processors, and the probability that the chip processors with higher loads are selected is lower, so that load balance among the chip processors contained in the multi-core chip is realized, and hardware resources are fully utilized.
S403: the server 101 generates an index corresponding to the request. Specifically, the server 101 generates an index based on the processor sequence number Core ID allocated to the request, the acquired Thread number Thread ID of the request, and the Timestamp.
S404: the server 101 stores the index in the data index table, encapsulates the index and the request into a processing request message, and sends the processing request message to the multi-core chip 102 for processing.
The storing, by the server 101, the index into the data index table may specifically include the following steps:
the server 101 saves the index in the index record of the chip processor corresponding to the processor serial number Core ID in the data index table.
In other embodiments, the number of indices for the chip processor is increased accordingly. In some cases, the adding may specifically be that, for each added index in the index record, the number of corresponding indexes is increased by one.
The server 101 packages the index and the request into a processing request message, and sends the processing request message to the multi-core chip 102 for processing, which specifically includes the following steps:
the processor 107 in the server 101 sends the encapsulated processing request message to the chip processor corresponding to the processor serial number Core ID in the multi-Core chip 102 for processing according to the processor serial number Core ID in the index.
In some other embodiments, the processor 107 in the server 101 sends the encapsulated processing request message to the multicore chip 102, decapsulates the processing request message by the multicore chip 102 to obtain the processing request and the corresponding index, obtains the processor serial number Core ID from the index, and sends the request and the index to the chip processor corresponding to the processor serial number Core ID for processing.
In some other embodiments, before the server 101 packages and sends the request and the corresponding index to the multi-core chip 102, the server 101 creates a Condition waiting variable Condition corresponding to the index and saves the Condition waiting variable in the corresponding chip processor index record in the data index table under the entry corresponding to the index. The server then waits using the conditional wait variable Condition.
A conditional wait variable, also referred to as a condition variable, is a conditional variable. Its value is typically "True" or "False". When the condition changes from unsatisfied to satisfied, its value changes. For example, the value of the condition wait variable is "False" when the condition is not satisfied, and the value thereof is changed to "True" when the condition is satisfied. The conditional wait variable is typically used to manage threads. For example, thread a waits for a condition using a conditional wait variable and suspends until the condition is satisfied, notifies the conditional wait variable, and then thread a wakes up.
When the server machine 101 creates the Condition waiting variable Condition, the Condition is set to receive the processing result. When the server 101 sends a processing request message to the multicore chip 102, the conditional wait variable Condition is False, the thread of the request in the server is suspended, and the server waits for a processing result returned from the multicore chip 102.
After receiving the processing result message from the multi-core chip 102, the server 101 acquires the index from the processing result message, queries and obtains the corresponding Condition waiting variable from the data index table according to the index, sets the value of the Condition waiting variable Condition to True, wakes up the requested thread by the server 101, and returns the processing result to the electronic terminal 100 by the server 101.
The server 101 may also set a time Condition for the Condition wait variable Condition. That is, if the processing result is not received after a predetermined time, that is, if the processing result is waiting for timeout, the value of the conditional wait variable Condition is set to True, the server 101 wakes up the thread of the request, and the server 101 returns a processing failure message to the electronic terminal 100.
S405: the server 101 receives a processing result message from the multi-core chip 102. The processing result message may be the encapsulated processing result and the index. The index is generated by the server 101 for the request corresponding to the processing result and sent to the multi-core chip 102, and after the chip 102 finishes executing, the processing result and the corresponding index are packaged and returned to the server 101.
S406: and maintaining a data index table according to the index in the processing result message.
Specifically, when the request processing is finished, the server 101 includes an index in the processing result message received from the multicore chip 102, and searches the index record of the chip processor corresponding to the processor sequence number Core ID in the data index table for the corresponding index by the index, and removes the index from the index record.
In other embodiments, the number of indices for the chip processor is reduced accordingly. In some embodiments, the reduction may specifically be: for each removal of an Index in the Index Record, the corresponding number of indexes Num _ Index is reduced by one.
S407: the server 101 transmits the processing result to the electronic apparatus 100.
Specifically, after the server 101 receives the processing result message from the multi-core chip 102 in step S405, the server 101 decapsulates the processing result message to obtain a processing result and a corresponding index. The server 101 queries the data index table according to the index to obtain a matched index and a corresponding condition waiting variable. At this time, the server 101 sets the value of the wait variable from "False" to "True", and the change in the value of the wait variable causes the previously suspended request thread to be woken up, and the server 101 returns the processing result to the electronic apparatus 100.
In some other embodiments, after receiving the processing result message, the server 101 fills the processing result in the entry corresponding to the index in the data index table, and returns the processing result to the electronic device 100.
If the chip processor is abnormal or damaged, cannot process the request and cannot return a processing result to the server 101, the value of the conditional wait variable is set to "True" from "False" after the wait timeout, the server 101 wakes up the thread of the request, and the server 101 returns a processing failure message to the electronic device 100.
In other embodiments, as shown in FIG. 8, the server 101 may include a front-end interface module and a load maintenance module. Wherein the front-end interface module is configured to perform the following method steps:
s501: a request is received from the electronic terminal 100.
S502: the chip processor that executes the request is selected according to the Load value Load [ ] of each chip processor. This step can refer to step S402, and is not described herein.
S503: an index corresponding to the request is generated. This step can refer to step S403, and is not described herein.
S504: the index is stored in the data index table, and the index and the request are encapsulated into a processing request message and sent to the multicore chip 102.
S505: waiting for a processing result update message sent by the load maintenance module or waiting for timeout.
Specifically, the waiting may be performed by using a conditional waiting variable, which specifically refers to step S404 and is not described herein again. In some embodiments, the load maintenance module may send the processing result and the index to the front-end interface module after receiving the processing result message and obtaining the processing result and the index.
S506: and returns the processing result to the electronic terminal 100.
Specifically, if the front-end interface module obtains the processing result message from the load maintenance module, the front-end interface module returns the processing result in the processing result message to the electronic terminal 100. If the condition waiting variable corresponding to the request is overtime, the front-end interface module returns a processing failure message to the electronic terminal 100.
The load maintenance module is used for maintaining the data index table and completing the following method steps:
s601: and receiving the index from the front-end interface module, and storing the index in a corresponding index record in the data index table.
Specifically, the load maintenance module stores the index in the index record of the chip processor corresponding to the processor sequence number Core ID in the data index table. In other embodiments, the number of indices for the chip processor is increased accordingly. In some cases, the adding may specifically be that, for each added index in the index record, the number of corresponding indexes is increased by one.
S602: and receiving the processing result message from the multi-core chip 102, inquiring in the data index table according to the index in the processing result message, and updating the data index table. This step can refer to step S406, which is not described herein. In some other embodiments, the load maintenance module saves the processing result in the received processing result message to the entry corresponding to the index, and then performs step S603.
S603: and informing the front-end interface module of finishing the processing.
Specifically, the load maintenance module may send the index and the processing result in the received processing result message to the front-end interface module, or send the index and the processing result stored in the data index table to the front-end interface module to indicate that the corresponding request is processed.
Through the above method steps, the server 101 creates an index for the request when receiving a new request, and the index is included in the whole processing flow, so that the server 101 can indicate the allocated chip processor through the index, the multi-core chip can allocate the request to the corresponding chip processor according to the index, and the server 101 can maintain the load of each chip processor in the multi-core chip 102 through the index, further select the chip processor, and improve the processing efficiency of the multi-core chip.
If one or more chip processors in the multi-core chip are abnormal or damaged, cannot process the request and cannot return the processing result to the server, after a period of time, a large number of indexes corresponding to requests which do not return the result will exist in the index record in the data index table corresponding to the abnormal chip processor in the server, and the index number is high, which indicates that the load value of the chip processor is high. As can be seen from steps S4021 to S4023, the assignment probability of the chip processor having a high load value is reduced. Therefore, the selection method can distribute the requests according to the loads of the chip processors, and avoids distributing a large number of requests to abnormal chip processors.
The embodiment of the invention provides a load maintenance method, which is used for a server 101 to maintain a data index table. The method comprises the main steps as shown in fig. 10:
s701: the server 101 detects whether each chip processor included in the multi-core chip 102 has a processing result message returned within the first time period T1, and determines the chip processor to which no processing result message is returned within the first time period T1 as an abnormal chip processor. The method specifically comprises the following steps: after receiving the processing result message from the multi-chip 102, the server 101 obtains the processing result and the corresponding index information by decapsulation, and detects the processor sequence number Core ID in the index included in the processing result message received in the first time period T1, thereby determining the chip processor to which no processing result message is returned. The first duration T1 may be set to be much longer than one traffic processing period (typically millisecond or second), and specifically, the first duration T1 may be set to 30 seconds.
If the server 101 detects that a processing result message is returned from a chip processor within the first time period T1, it indicates that the chip processor is in a normal state, and the server returns to step S701 to continue the detection. If the server 101 does not receive a processing result from a chip processor within the first time period T1, it indicates that the chip processor may be abnormal, and the processing result cannot be returned to the server 101, then the server 101 executes step S703;
s703: the server 101 triggers traffic compensation to determine the state of the abnormal chip processor.
In the first time period T1 detected in step S701, although the server 101 does not receive the processing result message from the abnormal chip processor, the server 101 still distributes the request to each chip processor of the multi-core chip 102 according to the data index table maintained by the server during the time period. For the abnormal chip processor, because no processing result is returned, a large number of indexes can be accumulated in the index records in the data index table, and the number of the indexes is high. As can be seen from steps S4021 to S4023, the assignment probability of the abnormal chip processor becomes gradually lower. When the assignment probability is too low, the server 101 cannot assign a request to the abnormal chip handler. At this time, even if the abnormal chip processor returns to normal, the request may not be received because the distribution probability is too low.
Specifically, the server 101 may generate a test request for testing the status of the exception chip handler and specify processing by the exception chip handler to confirm the status of the exception chip handler. When generating the index corresponding to the test request, the server 101 may designate the processor serial number Core ID therein as the abnormal chip processor, so that the test request is distributed to the abnormal chip processor.
Specifically, the test request may be a processing task preset in the server 101, such as an image recognition request for a picture prestored in the server, or may be a processing task copied from a request currently received by the server 101.
The server 101 obtains the thread number and the timestamp of the test request, generates an index in combination with the processor serial number Core ID of the specified abnormal chip processor, packages the test request and the index, and sends the test request and the index to the multi-Core chip 102 for processing. In addition, the server 101 stores the index in the index record of the corresponding chip processor in the data index table, and the number of indexes is increased accordingly.
In some other embodiments, before performing step S703, the server 101 may further perform step S702:
s702: determining that the abnormal chip processor has not received the request within the second time duration T2, and performing step S703 when determining that the abnormal chip processor has not received the processing result within the first time duration T1 and has not received the request within the second time duration T2, and performing step S704 if the abnormal chip processor has received the request within the second time duration T2. Specifically, the server 101 may determine whether the exception chip processor received the request within the second duration T2 by detecting the processor sequence number and the timestamp in the index recorded in the data index table. If the abnormal chip handler receives the request within the second time period T2, the server 101 may determine the state of the abnormal chip handler by detecting the processing result of the request. Wherein the second duration T2 may be set to be approximately one traffic processing period, such as 1 second. In some other embodiments, the time period corresponding to the second duration may be a portion of the time period corresponding to the first duration, for example, the server detects whether the exception chip processor has received the request within the last 1s of the first duration of 30 s.
S704: the server machine 101 detects whether a processing result is received from the abnormal chip handler for the third duration T3. Specifically, the server 101 can detect whether or not a processing result is received from an abnormal chip processor by detecting the processor number Core ID in the index contained in the received processing result message. The third time period T3 may be obtained by performing traffic compensation on the server 101 in step S703 to start timing with a timestamp obtained when generating an index for the test request, or may be obtained by determining in step S702 that the timing is started after the request is received in the second time period T2. In some embodiments, the third duration T3 may be set to be much longer than one traffic period, e.g., 30 seconds. If the server 101 receives a processing result from the abnormal chip handler within the third duration T3, which indicates that the state of the abnormal chip handler has recovered to normal, the server may process the new request normally. At this time, the number of indexes accumulated in the data index table may make the allocation probability of the abnormal chip processor low, and affect the processing efficiency of the chip processor. In order to enable the chip processor to normally receive the request, in the present embodiment, the server 101 may execute step S705.
If the server 101 does not receive a processing result from the exception chip handler within the third duration T3, it indicates that the status of the exception chip handler is still in the exception state. At this time, the server 101 returns to step S701 to perform monitoring. The server 101 maintains a high load and low probability of assignment state for the exception chip handler in the data index table so that newly received requests remain assigned to the exception chip handler with a low probability.
In other embodiments, if the server 101 does not receive the processing result from the abnormal chip processor within the third duration T3, indicating that the state of the abnormal chip processor is still in the abnormal state, the server 101 jumps to S703 to determine the state of the abnormal chip processor.
Step S705: and removing the index accumulated in the index record of the chip processor in the data index table. The server machine 101 returns to step S701 to perform monitoring.
Specifically, the server 101 receives the processing result from the exception chip handler within the third duration T3, indicating that the exception chip handler has recovered to normal. However, in this case, a large number of indexes that have not been processed or have failed to be processed during the abnormal state of the chip processor are stored in the data index table in the server 101, and the number of indexes is high. As can be seen from steps S4021 to S4023, the chip processors having a higher number of indices have a lower assignment probability. Therefore, the indexes accumulated in the index records of the chip processor in the data index table are removed, the probability of obtaining the requests by the chip processor can be improved, and the processing efficiency of the multi-core chip is improved.
In other embodiments, the server 101 clears the number of indices.
Through the above steps, the server 101 can obtain the change of the state of the chip processor, and can perform adaptive processing on the index records and the index number in the data index table according to the change of the state, so that the processing efficiency of the multi-core chip 102 is improved.
The embodiment of the invention provides another load maintenance method. In the present embodiment, the data structure of the data index table maintained in the server 101 may be as shown in table 2.
TABLE 2
Figure BDA0002389181240000191
The data index table includes a plurality of arrays, the number of arrays corresponding to the number of chip processors included in the multi-core chip. For example, for the multi-core chip shown in fig. 1, the data index table may include 4 arrays corresponding to the chip processors 1021 and 1024, respectively. The ith array corresponds to the ith chip processor in the chip and is used for maintaining the load condition of the chip processor. Specifically, the ith array is named by the processor serial number Core i of the ith chip processor, and includes two data units, namely, a New data unit New [ ] and an Old data unit Old [ ], and the New data unit New [ ] and the Old data unit Old [ ] have the same data structure, which may include an Index Record corresponding to the chip processor. The Index Record Index records store indexes related to processing requests processed by the corresponding chip processors. The Index generated by the server 101 for the newly received request is stored in the Index Record Index of the New data unit New [ ]. Over a certain period of time (e.g., 1 minute), the New data unit New [ ] and the Old data unit Old [ ] exchange data, i.e., the Index Record of the New data unit New [ ] is exchanged with the Index Record of the Old data unit Old [ ]. The initial state of the Index record Index may be null.
In other embodiments, the number of indexes Num _ Index may also be included in the data structure of the New data unit New [ ] and the Old data unit Old [ ]. The number of indexes Num _ Index is the number of indexes held in the corresponding Index Record. Its initial value may be 0.
For example, in the above table, in the array corresponding to Core 1, N indexes are stored in the Index Record of the new data unit, the Index number Num _ Index of the new data unit is N, and M indexes are stored in the Index Record of the old data unit, the Index number Num _ Index of the old data unit is M. In some embodiments, the sum of the index numbers of the new data units and the old data units in a chip processor may be considered as the load value of the chip processor. It should be understood that table 2 only illustrates the structure of the data index table, and the names and storage manners of the fields, and the present invention is not limited in any way.
In some embodiments, the server 101 may record the time T of data exchange between the new data unit and the old data unit, compare the time stamp in the index with the time T, and if the time stamp of the index is greater than the time T, store the index in the new data unit of the corresponding chip processor.
In some embodiments, the new data unit and the old data unit of all chip processors are exchanged at the same time. In other embodiments, the new data unit and the old data unit of each chip processor may be exchanged at different times, and the server may record the times at which the new data unit and the old data unit of each chip processor are exchanged. In other embodiments, the cycles for data exchange by the chip processors may be different.
The following describes a method for the server 101 to call the multi-core chip 102 and a method for the server 101 to maintain the load condition of the multi-core chip 102 in conjunction with a request processing flow. The method is based on a data index table as shown in table 2.
S801: the electronic terminal 100 sends a request to the server 101;
s802: the server 101 receives the request, and selects a chip processor to execute the request according to the Load value Load [ i ] of each chip processor. The Load value Load [ i ] of the chip processor is the sum of the index number of a New data unit New [ ] and the index number of an Old data unit Old [ ] of the chip processor in the data index table.
The specific steps of the server 101 selecting the chip processor for processing the request according to the Load value Load [ i ] of each chip processor may refer to steps S4021 to S4023, and are not described herein again.
S803: the server 101 generates an index from the processor serial number Core ID assigned to the request, the acquired thread number of the request, and the timestamp.
S804: the server 101 stores the index into the data index table, encapsulates the index and the request into a processing request message, and sends the processing request message to the multi-core chip 102 for processing.
The server 101 stores the index into the data index table, and specifically includes the following steps:
the server 101 saves the Index in the Index Record of the New data unit New [ ] of the chip processor corresponding to the processor serial number Core ID in the data Index table. In other embodiments the number of indices Num _ Index of the New data unit New [ ] is increased accordingly. In some embodiments, the adding may specifically be that, for each added Index in the Index Record, the corresponding Index number Num _ Index is incremented by one.
The server 101 performs the management process of the request thread by using the conditional wait variable with reference to step S404, which is not described herein again.
S805: the server 101 receives a processing result message from the multi-core slice 102, the processing result message containing the encapsulated processing result and the index.
S806: the server 101 maintains a data index table according to the index in the processing result message.
Specifically, when the request processing is finished, the server 101 includes an Index in the processing result message received from the multicore chip 102, and removes the Index from the Index Record by searching the Index for the corresponding Index in the Index Record Index of the New data unit New [ ] and the Old data unit Old [ ] of the chip processor corresponding to the processor number Core ID in the data Index table. In other embodiments the number of indices Num _ Index of the corresponding data unit is reduced accordingly. In some cases, the reduction may specifically be: every time an Index is removed from the Index Record, the number Num _ Index of indexes of the corresponding data unit is reduced by one.
S807: the server 101 transmits the processing result to the electronic apparatus 100.
The process of the server 101 performing thread management using the condition waiting variable according to the processing result may refer to step S407, and is not described herein again.
Through the method steps, the server creates an index for a new request when receiving the request, and the index is included in the whole processing flow, so that the server can indicate the allocated chip through the index, and the server can maintain the chip load through the index so as to balance the load.
An embodiment of the present invention provides another load maintenance method, where the load maintenance method is based on a data index table provided in table 2, and is used to maintain a load when an abnormal chip processor occurs, as shown in fig. 11, where the method specifically includes the following steps:
s901: the server 101 detects whether each chip processor included in the multi-core chip 102 has a processing result message returned within the fourth time length T4, and determines that the chip processor having no processing result message returned within the fourth time length T4 is an abnormal chip processor. For the detailed operation of this step, reference may be made to step S701, which is not described herein again. In some embodiments, the fourth duration T4 may be counted after the new data unit and the old data unit of the chip processor in the data index table exchange data. The fourth duration T4 may be set to be much longer than one traffic processing period (typically in milliseconds or seconds), and in particular, may be set to 30 seconds.
If the server 101 detects that a chip processor has a processing result message returned within the fourth time period T4, it indicates that the chip processor is in a normal state. The server may perform step S905; in other embodiments, if the server 101 detects that a chip processor has a processing result message returned within the fourth time period T4, it may return to step S901 for the next detection.
If the server 101 does not receive a processing result from a chip processor within the fourth time period T4, it indicates that the chip processor may be abnormal, and the processing result cannot be returned to the server 101, then the server 101 executes step S903;
s903: the server 101 triggers traffic compensation to determine the state of the abnormal chip processor. Then, step S904 is executed. For the specific operation of this step, refer to step S703, which is not described herein again.
The server 101 stores the index generated for the test request in the index record of the new data unit of the corresponding chip processor in the data index table, and the number of the indexes of the new data unit can be increased correspondingly. In some other embodiments, the server 101 further includes, before performing step S903, step S902:
s902: it is determined that the exceptional chip handler has not received the request within the fifth time length T5, and upon determining that the exceptional chip handler has not received the processing result within the fourth time length T4 and has not received the request within the fifth time length T5, the server 101 executes step S903. When the exception chip handler receives a request within the fifth duration T5, the server 101 may perform step S904. For the detailed operation of this step, reference may be made to step S702, which is not described herein again.
S904: the server 101 detects whether a processing result is received from the exception chip handler for a sixth duration T6. For the detailed operation of this step, reference may be made to step S704, which is not described herein again. If the server 101 receives a processing result from the abnormal chip processor within the sixth duration T6, which indicates that the state of the abnormal chip processor has recovered to normal, the request may be processed normally. The server 101 may perform step S905; if the server 101 does not receive a processing result from the exception chip handler within the sixth duration T6, it indicates that the status of the exception chip handler is still in the exception state. The server 101 may perform step S906.
The sixth time duration T6 may be set to a value far greater than one service period, such as 30 seconds, or may be set to a time duration from the time stamp obtained when the server indexes the test request in step S903 to the next time the new data unit and the old data unit are turned over.
S905: before the next time the new data unit and the old data unit of the chip processor exchange data, the server 101 may remove the index accumulated in the index record of the old data unit of the chip processor in the data index table, and correspondingly clear the index number of the old data unit. The server 101 executes step S907.
Specifically, if the server 101 detects that a processing result of a chip processor returns within the fourth time length T4 in step S901, which indicates that the chip processor is in a normal operating state, when a new data unit and an old data unit exchange data, because the time period for exchanging data is much longer than the service processing period, the requests corresponding to the indexes stored in the index records of the old data unit should all be processed. Under normal circumstances, the index record of the old data unit should be empty and the corresponding index number should be 0. However, since the chip processor may have an occasional or sudden exception during the processing of the request, it is restored to normal after the exception. Then there may be a small number of requests that do not return processing results during the exception, resulting in a small number of indexes being retained in the index record in the old data unit, and the number of indexes is also not 0. As can be seen from steps S4021 to S4023, the number of indexes in the data index table will affect the allocation probability of the chip processor, and therefore, when the chip processor is in a normal state, the index records of the old data units are emptied, which can improve the probability of the chip processor receiving the request, and is beneficial to fully utilizing the processing capability of the multi-core chip and improving the processing efficiency.
In some other embodiments, the server 101 may first determine whether the index number of the old data unit is 0 and/or whether the index record is empty, and if the index number is not 0 and/or the index record is not empty, the server 101 may empty the index record of the old data unit before the new data unit and the old data unit exchange data next time.
In other embodiments, the server 101 may zero out the index number of the old data unit.
If the server 101 receives the processing result in the sixth time period T6 in step S904, it indicates that the abnormal chip processor has recovered from the abnormal state, and it needs to remove the index corresponding to the overtime request, remove all index records stored in the index record of the old data unit of the chip processor in the data index table, so as to reduce the load value of the chip processor, and improve the probability of receiving a new request by the chip processor. Meanwhile, the index of the original new data unit is stored in the exchanged old data unit, so that the server 101 can perform corresponding update in the data index table when receiving the relevant processing result and index.
S906: before the next time the new data unit and the old data unit of the chip processor exchange data, the server 101 saves the index of the old data unit of the chip processor in the data index table in the new data unit, and then clears the index record of the old data unit. The server 101 executes step S907. In other embodiments, the server 101 may add the index number of the new data unit to the index number of the old data unit as the index number of the new data unit and then clear the index number of the old data unit.
Specifically, if the server 101 does not receive the processing result in the sixth duration T6 in step S904, it indicates that the chip processor is still in the abnormal state, and the index number value of the new data unit of the abnormal chip processor are added to the new data unit, so that when the new data unit and the old data unit exchange data, the chip processor in the exchanged data index table can continue to maintain the high-load state, and the probability that a new request is assigned to the abnormal chip processor remains in a lower state. Meanwhile, all data is stored in the old data unit before the new data unit and the old data unit exchange data, and if the state of the chip processor is recovered to be normal before the next data exchange, the data in the old data unit is removed according to the steps.
S907: and exchanging the index records and the index numbers of the new data unit and the old data unit of the chip processor in the data index table. The server machine 101 may return to step S901 for monitoring.
In other embodiments, if the server 101 does not receive the processing result from the abnormal chip handler within the sixth duration T3, indicating that the state of the abnormal chip handler is still in the abnormal state, the server 101 may jump to step S903 after the new data unit and the old data unit exchange data to determine the state of the abnormal chip handler.
Through the steps, the server can obtain the state change of the chip processor, can perform adaptive processing on the data maintained in the data index table according to the state change, and can more accurately process the index records and the index number in the data unit, so that the processing efficiency of the multi-core chip is improved.
With reference to the multi-core chip shown in fig. 6A and the load maintenance method provided in the embodiment of the present invention, an embodiment of the present invention further provides a task processing method. As shown in fig. 12, the method mainly includes the following steps:
s1001: controller 1031-1034 receives a process request message from server 101, which includes a request and a corresponding index.
Specifically, the controller 1031-1034 receives, from the server 101, the processing request message assigned to the chip processor corresponding to the controller 1031-1034.
S1002: the controller 1031 and 1034 decapsulates the processing request message to obtain the request and the corresponding index.
S1003: controller 1031-1034 caches one or more requests and corresponding indexes;
s1004: controller 1031-1034 obtains the request from the cached request and sends the request to the chip processor. For the detailed operation of this step, reference may be made to step S203, which is not described herein again.
In some other embodiments, when merging the requests, the controller 1031-1034 obtains one or more requests and corresponding indexes from the cached requests and sends the requests or the merged requests and the corresponding indexes to the corresponding chip processors.
S1005: the controller 1031 and 1034 receives the processing result from the chip processor, encapsulates the processing result and the corresponding index to obtain a processing result message, and sends the processing result message to the server 101.
If the chip processor executes the merged request, the processing results received by the controllers 1031 and 1034 from the chip processor include the processing results corresponding to the merged requests. The controllers 1031 and 1034 may encapsulate the obtained plurality of processing results with the corresponding indexes in sequence to obtain a plurality of processing result messages.
In other embodiments, the controllers 1031 and 1034 may receive the encapsulated process result message from the chip processor and send the process result message to the server 101.
In combination with the multi-core chip shown in fig. 6C and the load maintenance method provided in the embodiment of the present invention, another task processing method is also provided in the embodiment of the present invention. As shown in fig. 13, the method mainly includes the following steps:
s1101: the controller 106 receives an encapsulated process request message from the server 101, the process request message including a request and a corresponding index.
S1102: the controller 106 decapsulates the processing request message to obtain the request and the corresponding index.
S1103: the controller 106 obtains the processor serial number Core ID from the index, and sends the request and the corresponding index to the controllers 1031 and 1034 connected to the corresponding chip processors.
S1104: controller 1031-1034 caches one or more requests and corresponding indexes;
s1105: controller 1031-1034 obtains the request from the cache and sends the request to the chip processor. The specific operation of this step can refer to step S1004, which is not described herein again.
S1106: the controller 1031 and 1034 receives the processing result from the chip processor, encapsulates the processing result and the corresponding index to obtain a processing result message, and sends the processing result message to the controller 106.
If the chip processor executes the merged request, the processing results received by the controllers 1031 and 1034 from the chip processor include the processing results corresponding to the merged requests. The controllers 1031 and 1034 may encapsulate the obtained plurality of processing results with the corresponding indexes respectively in sequence to obtain a plurality of processing result messages.
In other embodiments, controllers 1031-1034 receive the encapsulated process result message from the chip processor and send the process result message to controller 106.
S1107: the controller 106 transmits the processing result message to the server 101.
In some embodiments, the controllers 1031 and 1034 may send the encapsulated processing result message directly to the server 101. In some other embodiments, the controllers 1031 and 1034 may also send the processing result and the corresponding index to the controller 106, encapsulate the processing result message by the controller 106, and send the processing result message to the server 101.
By the mode, the distribution of the processing request message to the corresponding chip processor is completed in the multi-core chip, and the interface structure between the multi-core chip and the server is simplified.
The embodiments of the present invention can be arbitrarily combined to achieve different technical effects.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (14)

1. A service processing apparatus, characterized in that: which comprises the following steps of (a) preparing,
a first processor and a second processor;
the system comprises a first controller and a second controller, wherein the first controller is connected with the first processor, and the second controller is connected with the second processor;
the first controller is to:
storing a first request and a second request, the first request and the second request being requests assigned to the first processor;
sending the first request to the first processor for processing;
receiving a first processing result from the first processor;
sending the second request to the first processor for processing;
the second controller is to:
storing a third request and a fourth request, the third request and the fourth request being requests assigned to the second processor;
sending the third request to the second processor for processing;
receiving a third processing result from the second processor;
and sending the fourth request to the second processor for processing.
2. The apparatus of claim 1, wherein the first controller is to store a fifth request and a sixth request, the first controller further to combine the fifth request and the sixth request into a seventh request, the seventh request to be sent to the first processor for processing.
3. The apparatus of claim 1, wherein the first controller receives a first message from a first apparatus, the first message including the first request and first information; the first information corresponds to the first processor.
4. The apparatus of claim 3, wherein the first controller is to send a second message to the first apparatus, the second message including the first processing result and the first information.
5. The apparatus of claim 1, further comprising,
the third controller is respectively connected with the first controller and the second controller;
the third controller is configured to receive a third message from the first device, where the third message includes the first request and the first information; the first information corresponds to the first processor;
and the third controller sends the first request and the first information to the first controller according to the first information.
6. The apparatus of claim 5, wherein the third controller is further to receive the first processing result and the first information from the first controller,
and the third controller sends a fourth message to the first device, wherein the fourth message comprises the first processing result and the first information.
7. The apparatus according to any of claims 3-6, wherein the first information comprises a first processor sequence number, a timestamp, and a thread number; wherein the first processor sequence number is assigned by the first device for the first request, and the timestamp and the thread number correspond to the first request.
8. A transaction system, comprising,
a first device;
a second device comprising a first processor and a second processor;
the first device sends a first message to the second device, wherein the first message comprises a first request and first information, and the first information corresponds to the first processor;
the second device sends the first request to the first processor for processing according to the first information;
the second device sends a second message to the first device, wherein the second message comprises a first processing result and the first information, and the first processing result is a processing result after the first processor processes the first request;
wherein the first information is generated after the first device receives the first request and determines a processor to process the first request;
the first information comprises a first processor serial number, a timestamp and a thread number; wherein the first processor sequence number is assigned by the first device for the first request, and the timestamp and the thread number correspond to the first request.
9. The system of claim 8, further comprising:
the first device stores a first table corresponding to the second device, wherein the first table includes a first array corresponding to the first processor and a second array corresponding to the second processor;
the first array comprises first data and second data, the first data comprises first records, the first records are used for storing one or more pieces of information, and the number of the information stored in the first records is a first number; the second data comprises a second record, the second record is used for storing one or more pieces of information, and the number of the pieces of information stored in the second record is a second number;
the second number group comprises third data and fourth data, the third data comprises a third record, the third record is used for storing one or more pieces of information, and the number of the pieces of information stored in the third record is a third number; the fourth data includes a fourth record, the fourth record is used for storing one or more pieces of information, and the number of the pieces of information stored in the fourth record is a fourth number;
wherein the first device stores the first information in the first record after generating the first information for the first request;
the first data and the second data exchange data with each other at regular intervals;
after the first device receives the second message from the second device, inquiring in the first record and/or the second record according to the first information in the second message;
if the first information is inquired in the first record, removing the first information from the first record;
and if the first information is inquired in the second record, removing the first information from the second record.
10. The system of claim 9, wherein if the first means detects that no processing results are received from the first processor for a first duration,
the first device generates a first test request and information corresponding to the first test request, wherein the information corresponding to the first test request also corresponds to the first processor;
the first means detects whether a processing result is received from the first processor within a third duration,
if the processing result is received, clearing the second record of the second data before the first data and the second data exchange data next time;
if the processing result is not received, before the first data and the second data are exchanged next time, one or more pieces of information in the second record of the second data are stored in the first record of the first data, and the second record is emptied.
11. The system of claim 10, wherein the first means, prior to generating the first test request, further comprises the first means detecting whether the first processor receives a request within a second time period, and generating the first test request if the first means determines that the first processor does not receive a request within the second time period.
12. The system of claim 10, wherein the first means generates a second test request after the first data and the second data exchange data if the first means determines that no processing results have been received from the first processor within the third duration.
13. The system of claim 9, further comprising,
the determining, by the first device, a processor that processes the first request according to the first table specifically includes:
determining a first capability and a second capability according to the sum of the first number and the second number and the sum of the third number and the fourth number, wherein the first capability represents the capability of the first processor to process the new request, and the second capability represents the capability of the second processor to process the new request;
determining a first and a second allocation probability according to the first and the second capabilities, determining a first and a second probability space; the first allocation probability and the first probability space correspond to the first processor, the second allocation probability and the second probability space correspond to the second processor;
and taking a random number, and determining the allocated processors according to the random number.
14. The system of claim 13, wherein the first means determines the first capability and the second capability based on a sum of the first number and the second number, a sum of the third number and the fourth number, and a processing speed of the first processor and the second processor.
CN202010108531.8A 2020-02-21 2020-02-21 Multi-core chip and scheduling method thereof Active CN111415291B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010108531.8A CN111415291B (en) 2020-02-21 2020-02-21 Multi-core chip and scheduling method thereof
PCT/CN2021/075196 WO2021164560A1 (en) 2020-02-21 2021-02-04 Multi-core chip and scheduling method therefor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010108531.8A CN111415291B (en) 2020-02-21 2020-02-21 Multi-core chip and scheduling method thereof

Publications (2)

Publication Number Publication Date
CN111415291A CN111415291A (en) 2020-07-14
CN111415291B true CN111415291B (en) 2021-09-21

Family

ID=71494207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010108531.8A Active CN111415291B (en) 2020-02-21 2020-02-21 Multi-core chip and scheduling method thereof

Country Status (2)

Country Link
CN (1) CN111415291B (en)
WO (1) WO2021164560A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111415291B (en) * 2020-02-21 2021-09-21 华为技术有限公司 Multi-core chip and scheduling method thereof
CN113835866B (en) * 2021-10-09 2024-02-20 南方电网数字电网研究院有限公司 Multithreading task scheduling optimization method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118478A (en) * 2007-09-10 2008-02-06 杭州华三通信技术有限公司 Caching management system
CN102340545A (en) * 2011-10-31 2012-02-01 深圳市五巨科技有限公司 Server and data processing method thereof
CN109840216A (en) * 2017-11-28 2019-06-04 华为技术有限公司 Data processing method and related elements, equipment, system for cache

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107102966B (en) * 2016-02-22 2020-03-13 龙芯中科技术有限公司 Multi-core processor chip, interrupt control method and controller
US10452287B2 (en) * 2016-06-24 2019-10-22 Futurewei Technologies, Inc. System and method for shared memory ownership using context
US11068399B2 (en) * 2017-09-29 2021-07-20 Intel Corporation Technologies for enforcing coherence ordering in consumer polling interactions by receiving snoop request by controller and update value of cache line
CN111415291B (en) * 2020-02-21 2021-09-21 华为技术有限公司 Multi-core chip and scheduling method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118478A (en) * 2007-09-10 2008-02-06 杭州华三通信技术有限公司 Caching management system
CN102340545A (en) * 2011-10-31 2012-02-01 深圳市五巨科技有限公司 Server and data processing method thereof
CN109840216A (en) * 2017-11-28 2019-06-04 华为技术有限公司 Data processing method and related elements, equipment, system for cache

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"面向机器学习的高性能SIMT处理器存储系统设计与实现";孙哲等;《微电子学与计算机》;20190831;第36卷(第8期);第72-76页 *

Also Published As

Publication number Publication date
WO2021164560A1 (en) 2021-08-26
CN111415291A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
CN114020470B (en) Resource allocation method and device, readable medium and electronic equipment
CN111415291B (en) Multi-core chip and scheduling method thereof
CN113485962B (en) Log file storage method, device, equipment and storage medium
CN113032766A (en) Application authority management method and device
CN112965879A (en) Data processing method and device, electronic equipment and readable storage medium
CN112685148A (en) Asynchronous communication method and device of mass terminals, computer equipment and storage medium
US20090013023A1 (en) Process Management Apparatus, Computer Systems, Distributed Processing Method, and Computer Program
CN111625422B (en) Thread monitoring method, thread monitoring device, electronic equipment and computer readable storage medium
CN111352863A (en) Memory management method, device, equipment and storage medium
CN113220366A (en) Sub-application starting method and device, terminal and server
CN111813541B (en) Task scheduling method, device, medium and equipment
CN113204425A (en) Method and device for process management internal thread, electronic equipment and storage medium
US20220269622A1 (en) Data processing methods, apparatuses, electronic devices and computer-readable storage media
CN115061743A (en) Interface calling method and device, computer readable medium and electronic equipment
CN108805741B (en) Fusion method, device and system of power quality data
CN116737330B (en) Task processing method and electronic equipment
CN116661584B (en) Resource scheduling method and related equipment
CN117724852B (en) Cloud computer computing resource allocation method and device
CN111831655B (en) Data processing method, device, medium and electronic equipment
CN112948108B (en) Request processing method and device and electronic equipment
CN113342837B (en) Data transmission method, device, electronic equipment and computer readable medium
CN117519968A (en) Method, device and equipment for executing data processing tasks of cluster hybrid deployment scene
CN118260173A (en) Time-consuming information determining method and device, electronic equipment and storage medium
CN116955714A (en) Graph structure processing method and graph structure processing system
CN114816796A (en) Inter-core communication method and related device of heterogeneous multi-core processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant