WO2021164560A1 - 一种多核芯片及其调度方法 - Google Patents

一种多核芯片及其调度方法 Download PDF

Info

Publication number
WO2021164560A1
WO2021164560A1 PCT/CN2021/075196 CN2021075196W WO2021164560A1 WO 2021164560 A1 WO2021164560 A1 WO 2021164560A1 CN 2021075196 W CN2021075196 W CN 2021075196W WO 2021164560 A1 WO2021164560 A1 WO 2021164560A1
Authority
WO
WIPO (PCT)
Prior art keywords
processor
request
information
processing
chip
Prior art date
Application number
PCT/CN2021/075196
Other languages
English (en)
French (fr)
Inventor
周一兰
吴庆丰
杨帆
吴志勇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021164560A1 publication Critical patent/WO2021164560A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Definitions

  • This application relates to the field of communications, and in particular to a multi-core chip and a scheduling method thereof.
  • the scheduling capability of existing multi-core chips is relatively weak.
  • the existing scheduling method cannot make full use of the processing capability of the chip, and cannot satisfy the concurrent scenario, resulting in low processing efficiency.
  • the technical problem to be solved by the embodiments of the present invention is to provide a device and a service scheduling method and system to improve the efficiency of service processing.
  • an embodiment of the present application provides a device, which includes a first processor and a second processor; a first controller and a second controller, the first controller is connected to the first processor, The second controller is connected to the second processor; the first controller is used to: store the first request and the second request, the first request and the second request are requests allocated to the first processor; send the first request to The first processor performs processing; receives the first processing message from the first processor; sends the second request to the first processor for processing; the second controller is used to store the third request and the fourth request, and the third request and The fourth request is a request allocated to the second processor; the third request is sent to the second processor for processing; the second processing message is received from the second processor; the fourth request is sent to the second processor for processing.
  • the device can reduce the idle time of the processor and improve the service processing efficiency of the processor.
  • the first controller stores the fifth request and the sixth request, and the first controller is also used to merge the fifth request and the sixth request into the seventh request, and send the seventh request Give processing to the first processor.
  • the processor can combine multiple requests into one for processing, which further improves the service processing efficiency of the processor.
  • the first controller receives the first message from the first device, the first message includes the first request and the first information; and the first information corresponds to the first processor.
  • the request is allocated to the processor according to the first information.
  • the above-mentioned first processing message contains a first processing result, and the first processing result is a result of processing the first request by the first processor; the first controller sends the second processing result to the first device.
  • the second message contains the first processing result and the first information.
  • the first device can match the request with the processing result according to the first information.
  • the above-mentioned first processing message is a third message
  • the third message includes the first processing result and the first information
  • the first controller sends the third message to the first device.
  • the controller only needs to forward the processing result message.
  • the first controller when the first controller sends the first request to the first processor, it also sends the first information; the above-mentioned first processing message is a first processing end message, and the first processing end message indicates The processing of the first request by the first processor ends; the first processor sends a fourth message to the first device, and the fourth message contains the first processing result and the first information.
  • the processor can directly return the processing result to the first device when the processing ends.
  • the device further includes a third controller, which is connected to the first controller and the second controller respectively; the third controller is used to receive the first request from the first device , The second request, the third request, and the fourth request, and the first request and the second request are sent to the first controller, and the third request and the fourth request are sent to the second controller.
  • a third controller which is connected to the first controller and the second controller respectively; the third controller is used to receive the first request from the first device , The second request, the third request, and the fourth request, and the first request and the second request are sent to the first controller, and the third request and the fourth request are sent to the second controller.
  • the device can allocate requests, simplify the connection structure between the first device and the device, and reduce the burden on the first device.
  • the third controller receives a fifth message from the first device, the fifth message contains the first request and the first information, and the first information corresponds to the first processor; the third controller The first request and the first information are sent to the first controller according to the first information.
  • the device can allocate requests according to the instructions of the first device, so that the first device can effectively manage the processors.
  • the above-mentioned first processing message includes the first processing result; the third controller is further configured to receive the first processing result and the first information from the first controller, and the third controller sends the first processing result to the first device.
  • a sixth message is sent, and the sixth message contains the first processing result and the first information.
  • the first device can uniformly receive the processing result from the third controller, which simplifies the connection structure between the first device and the device.
  • the above-mentioned first processing message includes a first processing result; the third controller is further configured to receive a seventh message from the first controller, and the seventh message includes the first processing result and first information. , The third controller sends a seventh message to the electronic device.
  • the controller generates the processing result message.
  • the above-mentioned first processing message is an eighth message, and the eighth message contains the first processing result and the first information; the third controller is further configured to receive the eighth message from the first controller , The third controller sends the eighth message to the first device.
  • the processor generates the processing result message.
  • the first information includes a first processor serial number, a timestamp, and a thread number; where the first processor serial number is allocated by the first device for the first request, and the timestamp and thread number are the same as Corresponding to the first request.
  • the first device can effectively manage the allocation and processing of requests.
  • the foregoing device may be a multi-core chip; the foregoing first processor and second processor may be chip processors; the foregoing first device may be a server or a processor in a server, that is, a CPU; the foregoing first message,
  • the fifth message may be a processing request message; the above-mentioned second message, third message, fourth message, sixth message, seventh message, and eighth message may be processing result messages.
  • an embodiment of the present application provides a service processing method, which includes: a first device sends a first message to a second device, the second device includes a first processor and a second processor; the first message includes The first request and the first information, the first information corresponds to the first processor; the second device sends the first request to the first processor for processing according to the first information; the second device sends the second device to the first device Message, the second message contains the first processing result and the first information, where the first processing result is the processing result after the first processor processes the first request; where the first information is that the first device receives the first request It is generated after determining the processor that processed the first request; the first information includes the first processor serial number, time stamp, and thread number; where the first processor serial number is allocated by the first device to the first request, and the time stamp And the thread number corresponds to the first request.
  • the first device may specifically be a server.
  • the second device may be a multi-core chip.
  • the second device can allocate processors according to the first information sent by the first device.
  • the first device stores a first table, and the first table corresponds to the second device, wherein the first table includes a first array and a second array, and the first array corresponds to the first processor ,
  • the second array corresponds to the second processor;
  • the first array includes the first data and the second data, the first data includes the first record, the first record is used to store one or more pieces of information, the quantity of which is the first Quantity;
  • the second data includes the second record, the second record is used to store one or more pieces of information, the quantity of the information is the second quantity;
  • the second array includes the third data and the fourth data, and the third data includes the third record ,
  • the third record is used to store one or more pieces of information, and the quantity of this information is the third quantity;
  • the fourth data includes the fourth record, and the fourth record is used to store one or more pieces of information, and the quantity of this information is the fourth quantity.
  • the first device After the first device generates the first information for the first request, it saves the first information in the first record; the first data and the second data exchange data at regular intervals; the first device receives from the second device After the second message, query in the first record and/or the second record according to the first information in the second message; if the first information is found in the first record, remove the first information from the first record ; If the first information is found in the second record, remove the first information from the second record.
  • the first device can obtain the processing request status of each processor of the second device.
  • the first device detects that the processing result is not received from the first processor within the first time period, the first device generates the first test request and the information corresponding to the first test request, and The information corresponding to a test request corresponds to the first processor; the first device detects whether the processing result is received from the first processor within the third period of time, and if the processing result is received, the next time the first data and the second data are exchanged Before the data, clear the second record of the second data; if the processing result is not received, save one or more pieces of information in the second record of the second data to the first data before the next data exchange between the first data and the second data. In the first record of one data, clear the second record.
  • the first device can obtain changes in the states of the processors of the second device and adaptively adjust and maintain the values, which can improve processing efficiency.
  • the first device before generating the first test request, further includes: the first device detects whether the first processor receives the request within the second time period, and if the server determines that the first processor is in the first test request If no request is received within two hours, the first test request is generated.
  • the first device can perform flow compensation after determining that the abnormal processor has not received the request.
  • the first device determines that the processing result is not received from the first processor within the third time period, after the first data and the second data exchange data, the first device generates a second test request .
  • the first device can continue to detect the state change of the processor after determining that the processor is in an abnormal state.
  • the first device determines the processor that processes the first request according to the first table, which specifically includes: determining according to the sum of the first quantity and the second quantity, and the sum of the third quantity and the fourth quantity
  • the first capability and the second capability indicates the capability of the first processor to process new requests
  • the second capability indicates the capability of the second processor to process new requests
  • the first allocation probability and the second capability are determined according to the first capability and the second capability.
  • the second distribution probability is to determine the first probability space and the second probability space; the first distribution probability and the first probability space correspond to the first processor, and the second distribution probability and the second probability space correspond to the second processor; take random According to the random number, the allocated processor is determined.
  • the first device can allocate processors according to the processing request of each processor, which can achieve load balancing and improve processing efficiency.
  • the first device determines the first capability and the second capability according to the sum of the first quantity and the second quantity, the sum of the third quantity and the fourth quantity, and the processing speed of the first processor and the second processor.
  • the second ability determines the first capability and the second capability according to the sum of the first quantity and the second quantity, the sum of the third quantity and the fourth quantity, and the processing speed of the first processor and the second processor. The second ability.
  • the first device can more accurately determine the new request processing capability of each processor.
  • the first device stores a first table, where the first table includes a fifth array and a sixth array, the fifth array corresponds to the first processor, and the sixth data corresponds to the second processor ;
  • the fifth array includes the fifth record, the fifth record is used to store one or more pieces of information, and the quantity of the information is the fifth quantity;
  • the sixth array includes the sixth record, and the sixth record is used to store one or more pieces of information,
  • the quantity of this information is the sixth quantity; among them, after the first device generates the first information for the first request, it saves the first information in the fifth record; after receiving the second message from the second device, the first device The first information in the second message is queried in the fifth record; if the first information is queried in the fifth record, the first information is removed from the fifth record.
  • the first device can obtain the processing request status of each processor of the second device.
  • the first device detects that the processing result is not received from the first processor within the fourth time period, the first device generates a third test request and information corresponding to the third test request, and The information corresponding to the third test request corresponds to the first processor; the first device detects whether the processing result is received from the first processor within the fifth time period, and if the processing result is received, the fifth record is cleared.
  • the first device can obtain changes in the states of the processors of the second device and adaptively adjust and maintain the values, which can improve processing efficiency.
  • the first device determines the processor that processes the first request according to the first table, which specifically includes: determining the third capability and the fourth capability according to the fifth quantity and the sixth quantity, and the third capability represents The ability of the first processor to handle new requests, and the fourth ability represents the ability of the second processor to handle new requests; the third and fourth allocation probabilities are determined according to the third and fourth capabilities, and the third and fourth probability spaces are determined.
  • Four probability spaces; the third distribution probability and the third probability space correspond to the first processor, and the fourth distribution probability and the fourth probability space correspond to the second processor; take a random number, and determine the assigned processor according to the random number.
  • the first device can allocate processors according to the processing request of each processor, which can achieve load balancing and improve processing efficiency.
  • the first device determines the third capability and the fourth capability according to the fifth quantity, the sixth quantity, and the processing speed of the first processor and the second processor.
  • the first device can more accurately determine the new request processing capability of each processor.
  • the above-mentioned first table may be a data index table; the above-mentioned first data and the third data may be new data units; the above-mentioned second data and fourth data may be old data units; the above-mentioned first information may be an index ;
  • the above-mentioned first record, second record, third record, fourth record, fifth record, and sixth record may be index records; the above-mentioned first quantity, second quantity, third quantity, fourth quantity, and fifth quantity ,
  • the sixth quantity may be an index quantity; the above-mentioned first capability, second capability, third capability, and fourth capability may be new request processing capabilities.
  • an embodiment of the present application provides a computer-readable medium, which includes a computer-readable medium for storing one or more programs, wherein the one or more programs are configured to be executed by the one or more processors
  • the one or more programs include instructions, and the instructions are used to execute any possible implementation manner in the second aspect.
  • an embodiment of the present application provides a service processing system, which includes: a first device; a second device, where the second device includes a first processor and a second processor; and the first device sends the first device to the second device. A message.
  • the first message contains a first request and first information, where the first information corresponds to the first processor; the second device sends the first request to the first processor for processing according to the first information; The second device sends a second message to the first device, the second message contains the first processing result and the first information, where the first processing result is the processing result after the first processor processes the first request; where the first The information is generated after the first device receives the first request and determines the processor that processes the first request; the first information includes the first processor serial number, time stamp, and thread number; where the first processor serial number is the first device Assigned for the first request, the timestamp and thread number correspond to the first request.
  • the first device may be a server, and the second device may be a multi-core chip.
  • the second device can allocate processors according to the first information sent by the first device.
  • the first device stores a first table, and the first table corresponds to the second device, wherein the first table includes a first array and a second array, and the first array corresponds to the first processor ,
  • the second array corresponds to the second processor;
  • the first array includes the first data and the second data, the first data includes the first record, the first record is used to store one or more pieces of information, the quantity of which is the first Quantity;
  • the second data includes the second record, the second record is used to store one or more pieces of information, the quantity of the information is the second quantity;
  • the second array includes the third data and the fourth data, and the third data includes the third record ,
  • the third record is used to store one or more pieces of information, and the quantity of this information is the third quantity;
  • the fourth data includes the fourth record, and the fourth record is used to store one or more pieces of information, and the quantity of this information is the fourth quantity.
  • the first device After the first device generates the first information for the first request, it saves the first information in the first record; the first data and the second data exchange data at regular intervals; the first device receives from the second device After the second message, query in the first record and/or the second record according to the first information in the second message; if the first information is found in the first record, remove the first information from the first record ; If the first information is found in the second record, remove the first information from the second record.
  • the first device can obtain the processing request status of each processor of the second device.
  • the first device detects that the processing result is not received from the first processor within the first time period, the first device generates the first test request and the information corresponding to the first test request, and The information corresponding to a test request corresponds to the first processor; the first device detects whether the processing result is received from the first processor within the third period of time, and if the processing result is received, the next time the first data and the second data are exchanged Before the data, clear the second record of the second data; if the processing result is not received, save one or more pieces of information in the second record of the second data to the first data before the next data exchange between the first data and the second data. In the first record of one data, clear the second record.
  • the first device can obtain changes in the states of the processors of the second device and adaptively adjust and maintain the values, which can improve processing efficiency.
  • the first device before generating the first test request, further includes: the first device detects whether the first processor receives the request within the second time period, and if the server determines that the first processor is in the first test request If no request is received within two hours, the first test request is generated.
  • the first device can perform flow compensation after determining that the abnormal processor has not received the request.
  • the first device determines that the processing result is not received from the first processor within the third time period, after the first data and the second data exchange data, the first device generates a second test request .
  • the first device can continue to detect the state change of the processor after determining that the processor is in an abnormal state.
  • the first device determines the processor that processes the first request according to the first table, which specifically includes: determining according to the sum of the first quantity and the second quantity, and the sum of the third quantity and the fourth quantity
  • the first capability and the second capability indicates the capability of the first processor to process new requests
  • the second capability indicates the capability of the second processor to process new requests
  • the first allocation probability and the second capability are determined according to the first capability and the second capability.
  • the second distribution probability is to determine the first probability space and the second probability space; the first distribution probability and the first probability space correspond to the first processor, and the second distribution probability and the second probability space correspond to the second processor; take random According to the random number, the allocated processor is determined.
  • the first device can allocate processors according to the processing request of each processor, which can achieve load balancing and improve processing efficiency.
  • the first device determines the first capability and the second capability according to the sum of the first quantity and the second quantity, the sum of the third quantity and the fourth quantity, and the processing speed of the first processor and the second processor.
  • the second ability determines the first capability and the second capability according to the sum of the first quantity and the second quantity, the sum of the third quantity and the fourth quantity, and the processing speed of the first processor and the second processor. The second ability.
  • the first device can more accurately determine the new request processing capability of each processor.
  • the first device stores a first table, where the first table includes a fifth array and a sixth array, the fifth array corresponds to the first processor, and the sixth data corresponds to the second processor ;
  • the fifth array includes the fifth record, the fifth record is used to store one or more pieces of information, and the quantity of the information is the fifth quantity;
  • the sixth array includes the sixth record, and the sixth record is used to store one or more pieces of information,
  • the quantity of this information is the sixth quantity; among them, after the first device generates the first information for the first request, it saves the first information in the fifth record; after receiving the second message from the second device, the first device The first information in the second message is queried in the fifth record; if the first information is queried in the fifth record, the first information is removed from the fifth record.
  • the first device can obtain the processing request status of each processor of the second device.
  • the first device detects that the processing result is not received from the first processor within the second time period, the first device generates a third test request and information corresponding to the third test request, and The information corresponding to the third test request corresponds to the first processor; the first device detects whether the processing result is received from the first processor within the third time period, and if the processing result is received, the fifth record is cleared.
  • the first device can obtain changes in the states of the processors of the second device and adaptively adjust and maintain the values, which can improve processing efficiency.
  • the first device determines the processor that processes the first request according to the first table, which specifically includes: determining the third capability and the fourth capability according to the fifth quantity and the sixth quantity, and the third capability represents The ability of the first processor to handle new requests, and the fourth ability represents the ability of the second processor to handle new requests; the third and fourth allocation probabilities are determined according to the third and fourth capabilities, and the third and fourth probability spaces are determined.
  • Four probability spaces; the third distribution probability and the third probability space correspond to the first processor, and the fourth distribution probability and the fourth probability space correspond to the second processor; take a random number, and determine the assigned processor according to the random number.
  • the first device can allocate processors according to the processing request of each processor, which can achieve load balancing and improve processing efficiency.
  • the first device determines the third capability and the fourth capability according to the fifth quantity, the sixth quantity, and the processing speed of the first processor and the second processor.
  • the first device can more accurately determine the new request processing capability of each processor.
  • FIGS. 1A and 1B are schematic diagrams of the architecture of a service processing system provided by an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a server calling a multi-core chip according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of another server calling a multi-core chip according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of another server calling a multi-core chip according to an embodiment of the present invention.
  • 6A-6E are schematic diagrams of the structure of a multi-core chip provided by an embodiment of the present invention.
  • FIGS. 7A-7B are schematic diagrams of a processing flow of a multi-core chip provided by an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a request processing flow according to an embodiment of the present invention.
  • FIG. 9 is a schematic flowchart of a method for selecting a chip processor according to an embodiment of the present invention.
  • FIG. 10 is a schematic flowchart of a load maintenance method provided by an embodiment of the present invention.
  • FIG. 11 is a schematic flowchart of another load maintenance method provided by an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of a processing flow of another multi-core chip provided by an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of another processing flow of a multi-core chip provided by an embodiment of the present invention.
  • a component may be, but is not limited to: a process running on a processor, a processor, an object, an executable file, an executing thread, a program, and/or a computer.
  • an application running on a computing device and the computing device may be components.
  • One or more components may exist in an executing process and/or thread, and the components may be located in one computer and/or distributed between two or more computers. In addition, these components can execute from various computer-readable media having various data structures thereon.
  • These components can be based on, for example, having one or more data packets (for example, data from a component that interacts with another component in a local system, a distributed system, and/or via signals such as the Internet).
  • the network interacts with other systems) signals to communicate in a local and/or remote process.
  • FIG. 1A and FIG. 1B exemplarily show the architecture of a service processing system related to an embodiment of the present application, and the service processing system is used to process a service initiated by an electronic device.
  • This service includes, but is not limited to, image processing services such as image recognition and image classification, and voice processing services such as voice recognition and speech synthesis.
  • the business processing system mainly includes one or more electronic devices 100 and a server 101.
  • One or more electronic devices 100 communicate with the server 101 to initiate requests related to various image processing services and/or voice processing services, and the request may be an HTTP message.
  • the server can communicate with the electronic device through a local area network or a wide area network.
  • the server 101 can call the multi-core chip 102 to complete the request initiated by the electronic device 100 through a software and hardware interface.
  • the hardware interface may include PCI, PCIe, or USB and other types of interfaces
  • the software interface may include an application program interface (API), such as an API interface SRC (Source) module and a DST (Destination) module encapsulated by software.
  • API application program interface
  • the SRC module It is an application program interface module for sending data from the server 101 to the chip 102
  • the DST module is an application program interface module for the server 101 to receive data from the chip 102.
  • the multi-core chip 102 After the multi-core chip 102 completes the processing of the request, it sends the processing result to the server 101, and the server 101 sends the processing result to one or more electronic devices 100.
  • the process of performing image processing and/or voice services by the electronic device and the server may include the following methods:
  • User 1 opens an application installed on the mobile phone, such as "Gallery", which can classify images.
  • the mobile phone uploads one or more pictures to the application server, and the application server returns the result of the image classification to the mobile phone after completing the image classification. or
  • User 2 opens a browser, accesses a web page used for image recognition, and the terminal uploads one or more pictures to the network server.
  • the network server returns the result of the image recognition to the terminal after completing the image recognition.
  • the user 3 sends a request to a smart home device (such as a smart speaker) through natural language, and the request may be to play a song, query the weather, or customize a reminder.
  • a smart home device collects the user's voice and sends the collected voice message to the server.
  • the server analyzes and returns the corresponding requested content to the smart home device.
  • the smart speaker can play the song requested by the user, broadcast the weather, or set a reminder requested by the user.
  • the electronic terminal 100 sends a request to the server 101, and the request includes image or voice data.
  • the request may also include the request type.
  • the request may include information indicating the type of image classification request.
  • the server 101 After the server 101 receives the request, it calls the multi-core chip 102 to process the request.
  • the multi-core chip 102 sends the processing result to the server 101 after completing the processing of the request.
  • the server 101 returns the processing result to the electronic terminal 100. Specifically, the server 101 may return the processing result to the electronic terminal 100 in the form of an HTTP message.
  • the processing results returned by the multi-core chip to the server can include the following situations:
  • the multi-core chip If the multi-core chip completes the processing, the chip returns the corresponding processing result to the server, and the server returns the processing result to the electronic terminal. For example, for an image classification request, the multi-core chip returns the processing result to the server after the processing is completed.
  • the processing result can be a specific classification of the image, such as "landscape", "sports", and so on.
  • the corresponding processing result may be a specific result of image recognition, such as whether face recognition is successful, or the name of a person, the names of animals and plants, etc.
  • the chip If the processing of the multi-core chip fails, the chip returns a processing failure message to the server, and the server returns the processing failure message to the electronic terminal.
  • the server If the multi-core chip cannot process the request, nor can it return a message to the server, the server returns a processing failure message to the electronic terminal after the waiting timeout.
  • the electronic device 100 may be a portable electronic device, such as a mobile phone, a tablet computer, a laptop computer (Laptop), a wearable electronic device (such as a smart watch), and so on.
  • the above-mentioned electronic device 100 may also be a desktop computer or a vehicle-mounted device.
  • FIG. 2 shows a schematic diagram of the structure of the electronic device 100.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2.
  • Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM Subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than those shown in the figure, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the multi-core chip used in the embodiment of the present invention contains multiple chip processors inside. As shown in FIG. 1, the multi-core chip 102 includes four chip processors 1021-1024. Multiple chip processors can run in parallel, enabling the chip to handle multiple tasks at the same time.
  • the above-mentioned multi-core chip may be applied to a neural network chip for image/video processing or voice processing, which is also called an AI chip or an AI accelerator card.
  • the above-mentioned multi-core chip may also be applied to other systems. This application does not impose any restrictions on this.
  • the multi-core chip used in the embodiment of the present invention is mainly deployed in the server.
  • the chip can be plugged into the server (such as through PCI or PCIe slots, etc.), and can also be used as an external device to connect to the server (such as through PCT, PCIe or USB, etc.).
  • the embodiment of the present invention provides a method in which a server uses a single thread to call a multi-core chip and a multi-core chip.
  • the Host side represents the server side
  • the Device side represents the multi-core chip side.
  • the server calls the multi-core chip to process a request. After the multi-core chip completes the processing of the request and returns the result to the server, the server calls the multi-core chip to process the next request.
  • one request may include multiple sub-requests.
  • the server can configure a chip processor included in a multi-core chip for each sub-request. After the server sends the request to the multi-core chip, each chip processor in the chip can process the corresponding sub-request at the same time.
  • the number of sub-requests included in the request may be less than or equal to the number of chip processors included in the chip.
  • the chip can uniformly return the processing result to the server after all chip processors have completed the processing of the corresponding sub-requests.
  • the server can specify the chip processor that completes the request. When the server sends a request to the multi-core chip, the request carries the serial number of the specified chip processor.
  • the embodiment of the present invention provides a method in which a server uses multiple threads to call a multi-core chip.
  • the server when the server receives a new request, the server creates a corresponding thread for the request. For each request, the server allocates a chip processor in the multi-core chip to process the request. Among them, the number of threads created can be at most the number of chip processors included in the chip, and each thread corresponds to a corresponding chip processor in the chip. For example, if the multi-core chip shown in FIG. 4 includes 4 chip processors, the number of threads created in the server is 4 at most. Therefore, in this scheduling mode, the server can process 4 requests at the same time. If there are already 4 requests being processed at the same time, the server will wait for one or more of the 4 requests to be processed before receiving a new request.
  • the server When the server sends a request to the chip, it can carry the serial number information of the chip processor in the request. At the same time, when the chip returns the processing result to the server, it can carry the serial number information of the chip processor in the processing result so that the server can receive the processing result. As a result, it returns to the corresponding thread for processing according to the serial number information of the chip processor.
  • the server can support concurrent operations to a certain extent, and fully utilize the processing capabilities of multiple chip processors in the multi-core chip, improving the processing efficiency of the multi-core chip .
  • the embodiment of the present invention provides a multi-core chip having multiple chip processors.
  • a corresponding controller is configured for each chip processor, and the controller is used to cache one or more requests of the corresponding chip processor.
  • a cache unit may be included in the controller, and one or more requests of the corresponding chip processor are cached in the cache unit.
  • Each controller is connected with the corresponding chip processor to send the request.
  • a request is sent from the SRC module to the multi-core chip, and processed in the chip processor of the multi-core chip.
  • the DST module receives the processing result sent by the multi-core chip
  • the next request is sent from the SRC module to the multi-core chip.
  • Multi-core chip sending During this period, the chip processor processing the request is in an idle state during the time the request is sent from the SRC module to the multi-core chip and the processing result is returned from the chip processor to the DST module.
  • the controller has a cache capability, when the server sends a request to the chip, it does not need to wait for one or more chip processors in the chip to process the next request before sending the next request.
  • the request is cached in the controller corresponding to the chip processor and processed by the chip processor.
  • the chip processor can immediately obtain the next request from the controller for processing. Therefore, the multi-core chip provided in this embodiment can further improve the processing efficiency of the multi-core chip.
  • the server receives a new request, it does not need to be limited by the number of chip processors included in the multi-core chip, which enhances the concurrency capability.
  • the controller is also used to obtain multiple requests from cached requests during each processing, merge the multiple requests into one request, and send the merged request to the corresponding chip processor for processing. deal with.
  • the requested input may be a high-dimensional array, for example, the high-dimensional array may be an array of 1*1024*256.
  • the controller merges, it merges two 1*1024*256 requests into one 2*1024*256 request. In some other embodiments, the controller may merge three or more requests into one request each time it merges requests.
  • the controller may determine whether to combine multiple requests according to the requested data volume and/or model matching. For example, if a request has a large amount of data, it may not be merged. Correspondingly, if both requests have a small amount of data, the two requests may be merged. For another example, if three requests use the same model and the processing process is the same, the three requests can be combined for processing. In some other embodiments, the controller may combine multiple requests according to the order of the requests.
  • the controller can combine multiple requests into one request and complete processing at one time, making full use of the processing capacity of each chip processor, so that the processing efficiency of the multi-core chip is further improved.
  • 6A-6E exemplarily show the structure of the multi-core chip applied in this embodiment. details as follows:
  • FIG. 6A exemplarily shows a structure of the multi-core chip applied in this embodiment.
  • the multi-core chip 102 includes four chip processors 1021-1024 and four corresponding controllers 1031-1034. Each chip processor is connected to its corresponding controller 1031-1034, such as the chip processor 1021. The corresponding controller 1031 is connected.
  • the controllers 1031-1034 are respectively connected to the input 104 and the output 105 of the chip 102 to receive requests from the processor 107 in the server 101 and send processing results to the processor 107.
  • FIG. 7A Based on the multi-core chip 102 shown in FIG. 6A, the processing flow of the request in the chip is shown in FIG. 7A:
  • the controller 1031-1034 receives a request from the processor 107.
  • the processor 107 determines the chip processor corresponding to each request, and sends the request to the corresponding controller 1031-1034 according to the serial number information of the chip processor corresponding to the request, and the controller 1031-1034 receives from the processor 107 Is the request assigned to the chip processor corresponding to the controller 1031-1034.
  • S202 The controller 1031-1034 caches one or more requests.
  • the controller 1031-1034 obtains the request from the cached request and sends the request to the chip processor.
  • the controller 1031-1034 can obtain one request from the cached request and send it to the chip processor for processing, or obtain multiple requests from the cached request, and merge the multiple requests into A request, the combined request is sent to the corresponding chip processor for processing.
  • the controller 1031-1034 receives the processing result from the chip processor, sends the processing result to the processor 107, and returns to step S203.
  • the processing results received by the controller 1031-1034 from the chip processor include processing results corresponding to the combined multiple requests. For example, request A is to perform image recognition on picture A, and request B is to perform image recognition on picture B. After the controller merges request A and request B, the combined request is to perform image recognition on picture A and picture B respectively. Recognition. After the chip processor processes the combined request, the processing result is the image recognition result of picture A and the image recognition result of image B.
  • the controller 1031-1034 when the controller 1031-1034 sends the processing result to the processor 107, it sends the serial number information of the chip processor together, or the processing result and the serial number information of the chip processor are packaged and then sent to the processor 107. , So that the processor 107 can find the corresponding thread according to the sequence number information.
  • FIG. 6B exemplarily shows another structure of the multi-core chip applied in this embodiment.
  • the chip processor 1021-1024 is connected to the output terminal 105 of the chip.
  • the chip processor directly returns the processing result to the processor 107 after completing the processing of the request.
  • the chip processor may send a processing end message to its corresponding controller 1031-1034, where the processing end message is used to indicate the end of its processing.
  • the controller 1031-1034 receives the processing end message, it sends the next request to the chip processor.
  • FIG. 6C exemplarily shows another structure of the multi-core chip applied in this embodiment.
  • the multi-core chip includes a controller 106, where the controller 106 is used to allocate requests corresponding to each chip processor.
  • the controller 106 is connected with the controllers 1031-1034 of each chip processor, and the controllers 1031-1034 are respectively connected with the corresponding chip processors 1021-1024.
  • the controller 106 is respectively connected to the input 104 and the output 105 of the chip to receive requests from the processor 107 and send processing results to the processor 107.
  • the controller 106 receives a request from the processor 107.
  • the request carries the serial number information of the chip processor.
  • what the controller 106 receives from the processor 107 may be the information after the request and the serial number information of the chip processor are packaged.
  • the controller 106 sends the request to the controller 1031-1034 of the corresponding chip processor according to the serial number information of the chip processor. In some embodiments, the controller 106 decapsulates the received packaging information to obtain the serial number information of the chip processor.
  • the controller 1031-1034 caches one or more requests.
  • step S304 The controller 1031-1034 obtains the request from the requests in the cache and sends the request to the chip processor. For details of this step, refer to step S203, which will not be repeated here.
  • the controller 1031-1034 receives the processing result from the chip processor, and sends the processing result to the controller 106.
  • the processing result received by the controller 1031-1034 from the chip processor carries the serial number information of the chip processor and forwards it to the controller 106.
  • the controller 1031-1034 receives the processing result from the chip processor, and the controller 1031-1034 sends the processing result and the serial number information of the chip processor to the controller 106.
  • the controller 106 sends the processing result to the processor 107.
  • the processing result can carry the serial number information of the chip processor, so that the server can search for the corresponding thread according to the serial number information.
  • what the controller 106 sends to the processor 107 is the processing result after packaging and the serial number information of the chip processor.
  • the packaging can be done in the chip processor, the controller 1031-1034 or the controller 106.
  • FIG. 6D and 6E exemplarily show another structure of the multi-core chip applied in this embodiment.
  • the controller 1031-1034 is connected to the output terminal 106 of the chip 102.
  • the chip processors 1021-1024 are connected to the output terminal 105 of the chip 102.
  • the controller 1031-1034 or the chip processor respectively returns the processing result to the server.
  • controllers 1031-1034 may be included in corresponding chip processors.
  • the input terminal 104 and the output terminal 105 in FIGS. 6A-6E can be combined into one input/input port.
  • the embodiment of the present invention provides a method for maintaining the load of a multi-core chip.
  • the load may include the number of requests that have been sent to the chip processor for processing without returning results.
  • the server maintains a data index table, which is used to maintain the load status of each chip processor in the multi-core chip.
  • the server allocates the corresponding chip processor to the request according to the load condition of each chip processor in the data index table.
  • the structure of the data index table is shown in Table 1.
  • the data index table includes a plurality of arrays, and the number of the arrays corresponds to the number of chip processors included in the multi-core chip.
  • the multi-core chip 102 shown in FIG. 1 includes 4 chip processors 1021-1024.
  • the data index table may include 4 arrays, which correspond to the chip processors 1021-1024 respectively.
  • the number of chip processors included in the multi-core chip may be preset in the server, or may be configuration information obtained by the server from the multi-core chip.
  • the i-th array corresponds to the chip processor 102i in the chip 102, and is used to maintain the load condition of the chip processor 102i.
  • the i-th array may be named after the processor serial number Core i of the chip processor 102i, and the array may include an index record Index Record.
  • the index record Index Record stores the index Index corresponding to the request sent to the chip processor 102i, where the server generates a corresponding index for each request.
  • the number of indexes can be regarded as the load value of the chip processor.
  • the initial state of Index Record can be empty.
  • the i-th array may also include the number of indexes Num_Index.
  • the number of indexes Num_Index is the number of indexes stored in the index record. Its initial value can be 0.
  • Table 1 only exemplarily shows the structure of the data index table, and the specific names of each field are not limited in the present invention.
  • the server 101 After the server 101 receives the new request, it determines the chip processor that processes the request.
  • the index is generated after the server 101 determines the chip processor that processes the request. In other words, each request corresponds to an index.
  • the index may include parameters such as the processor serial number Core ID, the thread number Thread ID, and the timestamp Timestamp.
  • the processor serial number Core ID is the ID of the chip processor determined by the server 101 to process the request;
  • the thread ID Thread ID is allocated by the operating system for the thread corresponding to the request, and can be passed through std::this_thread when the index is generated. :get_id() function to obtain.
  • the server After the server receives the request, the operating system creates a thread for each request.
  • the thread number is the only sign for the operating system to identify the thread. Recording the thread ID corresponding to the request in the server enables the server to track the processing of the request. In addition, because the thread number may be reused at different times within the system, in this embodiment, the timestamp Timestamp is also recorded when the index is generated, and it can be obtained through the std::time_t getTimeStamp() function when the index is generated.
  • the server can uniquely identify the request, and the processor serial number Core ID in the index establishes the mapping between the request and the corresponding chip processor.
  • the processing message carries an index.
  • the multi-core chip can allocate the request to the corresponding chip processor based on the index, and the server can be based on The index matches the processing results and effectively monitors the processing of the request.
  • the server 101 After the server 101 generates the index, it encapsulates the request and the corresponding index into a processing request message, and sends the encapsulated processing request message to the multi-core chip 102. After the chip 102 decapsulates the processing request message, the request and correspondence can be obtained. From the index, the processor serial number Core ID of the chip processor corresponding to the request is obtained from the index, and the chip 102 sends the request to the chip processor corresponding to the Core ID, and the chip processor processes the service corresponding to the request. In some other embodiments, the server 101 directly sends the packaged processing request message to the chip processor corresponding to the Core ID in the chip 102 according to the processor serial number Core ID in the index, and the chip processor processes the request.
  • the chip 102 After the chip 102 completes the processing of the request, it encapsulates the processing result and the corresponding index into a processing result message, which is sent to the server 101. After the server 101 receives the processing result message, it decapsulates to obtain the processing result and the corresponding index. The server 101 can match the corresponding request through the information in the index, and sends the processing result to the corresponding electronic terminal 100.
  • the processing message in the entire processing flow from the request sent to the multi-core chip for processing to the end of the multi-core chip processing and the return of the processing result to the server, the processing message carries an index, which also enables the server to perform chip processing based on the index. Load management.
  • the server 101 After the server 101 generates the index, it stores the index in the index record Index Record of the chip processor corresponding to the Core ID in the data index table according to the processor serial number Core ID in the index, and the corresponding index number Num_Index increases; when The server 101 receives the processing result and the corresponding index. According to the processor serial number Core ID in the index, the server finds the corresponding thread ID and timestamp Timestamp in the index record Index Record of the chip processor corresponding to the Core ID. For the matched index, the index is removed from the index record Index Record, and the corresponding index number Num_Index is reduced.
  • the index is stored in the index record Index Record of the chip processor corresponding to the Core ID in the data index table according to the processor serial number Core ID in the index; when the server 101 receives After processing the result and the corresponding index, the server finds the index that matches the thread number Thread ID and the timestamp Timestamp in the index record Index Record of the chip processor corresponding to the Core ID according to the processor serial number Core ID in the index. Remove the index from Index Record.
  • S401 The electronic terminal 100 sends a request to the server 101;
  • the server 101 receives the request, and selects the chip processor that executes the request according to the load value Load[] of each chip processor.
  • the index number Num_Index of the chip processor in the data index table can be used as the load value of the chip processor Load[], or the index number saved in the index record corresponding to the chip processor in the data index table can be calculated, and It is used as the load value Load[] of the chip processor.
  • the higher the load value of the chip processor the more requests it is currently executing, and the weaker its ability to handle new requests. Conversely, the lower the load value, the stronger its ability to handle new requests.
  • the distribution probability is determined according to the current load value of each chip processor, and the chip processor with the higher the load value has the lower corresponding distribution probability.
  • the server selecting the chip processor to execute the processing request according to the load value of each chip processor includes the following steps:
  • the server 101 determines its new request processing capability according to the load value of each chip processor in the data index table. The higher the load value, the weaker the corresponding new request processing capability.
  • the relationship between the new request processing capability AoE[i] of the chip processor 102i and the load value Load[i] may be:
  • AoE[i] represents the new request processing capability of the chip processor 102i
  • Load[i] represents the load value of the chip processor 102i.
  • the number of indexes corresponding to the index records of the chip processor 102i in the data index table may be regarded as its load value Load[i].
  • the server can compensate the load value of each chip processor included in the multi-core chip, and obtain the new request processing capacity of each chip processor according to the compensated load value .
  • the above compensation is to add one to the load value of each chip processor.
  • the load value of each chip processor is compensated before the new request processing capacity of each chip processor is determined.
  • step S4021 may also be: the server 101 determines the new request processing capability of each chip processor according to the load value and processing speed of each chip processor. The higher the load value, the lower the processing speed, and the weaker the corresponding new request processing ability. Relatively, the lower the load value, the faster the processing speed, and the stronger the corresponding new request processing ability.
  • the relationship between the new request processing capability AoE[i] of the chip processor 102i, the load value Load[i] and the processing speed SoE[i] may be:
  • AoE[i] represents the new request processing capability of the chip processor 102i
  • Load[i] represents the load value of the chip processor 102i
  • SoE[i] represents the processing speed of the chip processor 102i.
  • the processing speed of each chip processor in the multi-core chip is its attribute information, which may be pre-stored in the server, or may be configuration information obtained from the multi-core chip.
  • a multi-core chip includes 4 chip processors
  • its new request processing capability AoE[4] ⁇ 1,1,1/3,1/2 ⁇ .
  • S4022 Calculate the allocation probability p[i] of each chip processor according to the new request processing capability AoE[i] of each chip processor, and determine the probability space of each chip processor.
  • the allocation probability indicates the probability that a new processing request is allocated to the chip processor. The stronger the processing capability of the new request, the greater the allocation probability corresponding to the chip processor.
  • the allocation probability p[i] of the chip processor may be:
  • AoE[i] is the new request processing capability of the chip processor 102i, It is the sum of new request processing capabilities of all chip processors, where n is the number of chip processors included in the multi-core chip. The sum of all chip processor allocation probabilities is 1.
  • this step may specifically be: generating a random number within the (0,1] interval, and determining that the chip processor corresponding to the probability space in which it is located is the chip processor that executes the request. For example, Based on the probability space corresponding to the 4 chips obtained in step S4022, if the generated random number is 0.5, since 3/7 ⁇ 0.5 ⁇ 9/14, the random number is located in the probability space corresponding to the second chip processor , Therefore, the new request will be allocated to the second chip for processing.
  • the chip processor can be selected according to the load value of the current chip processor.
  • the higher the load the lower the probability that the chip processor will be selected, which is beneficial to realize between the chip processors included in the multi-core chip.
  • Load balance make full use of hardware resources.
  • the server 101 generates an index corresponding to the request. Specifically, the server 101 generates an index according to the processor serial number Core ID allocated to the request, the acquired thread number Thread ID of the request, and the timestamp Timestamp.
  • S404 The server 101 saves the index in the data index table, encapsulates the index and the request into a processing request message, and sends it to the multi-core chip 102 for processing.
  • the server 101 saves the index in the data index table, which may specifically include the following steps:
  • the server 101 saves the index in the index record of the chip processor corresponding to the processor serial number Core ID in the data index table.
  • the number of indexes of the chip processor increases accordingly.
  • the increase may specifically be that for each additional index in the index record, the corresponding index number is increased by one.
  • the server 101 encapsulates the index and the request into a processing request message, and sends it to the multi-core chip 102 for processing, which specifically includes the following steps:
  • the processor 107 in the server 101 sends the packaged processing request message to the chip processor corresponding to the processor number Core ID in the multi-core chip 102 for processing according to the processor serial number Core ID in the index.
  • the processor 107 in the server 101 sends the packaged processing request message to the multi-core chip 102, and the chip 102 decapsulates to obtain the processing request and the corresponding index, and obtains the processor serial number Core ID from the index. , Send the request and index to the chip processor corresponding to the processor serial number Core ID for processing.
  • the server 101 before the server 101 encapsulates the request and the corresponding index and sends it to the multi-core chip 102, the server 101 creates a condition waiting variable Condition corresponding to the index, and saves the condition waiting variable in the data index table In the corresponding chip processor index record in the corresponding entry under the index. Then the server uses the condition wait variable Condition to wait.
  • a conditional wait variable also called a condition variable, is a variable with conditions. Its value is usually “True” or “False”. When the condition changes from unsatisfied to satisfied, its value changes. For example, when the condition is not met, the value of the condition waiting variable is "False”, and when the condition is met, its value changes to "True”.
  • Conditional wait variables are generally used to manage threads. For example, thread A uses the conditional wait variable to wait for a certain condition and hangs until the condition is met, notify the condition to wait for the variable, and then thread A is awakened.
  • the server 101 After the server 101 creates the condition waiting variable Condition, it sets the condition to receive the processing result. After the server 101 sends the processing request message to the multi-core chip 102, the condition waiting variable Condition is False, and the thread of the request in the server is suspended, waiting for the processing result returned from the multi-core chip 102.
  • the server 101 After the server 101 receives the processing result message from the multi-core chip 102, it obtains the index from the processing result message, and obtains the corresponding condition waiting variable from the data index table according to the index, and sets the value of the condition waiting variable Condition to True. The requested thread wakes up, and the server 101 returns the processing result to the electronic terminal 100.
  • the server 101 may also set a time condition for the condition waiting variable Condition. That is, if the processing result is not received after a predetermined time, it is the waiting timeout.
  • the value of the condition wait variable Condition is set to True, the server 101 wakes up the requested thread, and the server 101 returns a processing failure message to the electronic terminal 100.
  • the server 101 receives the processing result message from the multi-core chip 102.
  • the processing result message may be the encapsulated processing result and index.
  • the index is generated by the server 101 for the request corresponding to the processing result and sent to the multi-core chip 102. After execution, the chip 102 encapsulates the processing result and the corresponding index and returns to the server 101.
  • the processing result message received by the server 101 from the multi-core chip 102 includes an index, and the index is used to find the corresponding index in the index record of the chip processor corresponding to the processor number Core ID in the data index table. To remove the index from the index record.
  • the number of indexes of the chip processor is correspondingly reduced.
  • the reduction may specifically be: each time an index is removed from the Index Record, the corresponding index number Num_Index is reduced by one.
  • S407 The server 101 sends the processing result to the electronic device 100.
  • the server 101 decapsulates the processing result message to obtain the processing result and the corresponding index.
  • the server 101 queries the data index table according to the index to obtain a matching index and its corresponding condition waiting variable.
  • the server 101 sets the value of the conditional waiting variable from “False” to “True”.
  • the change of the conditional waiting variable value causes the previously suspended request thread to be awakened, and the server 101 returns the processing result to the electronic device 100.
  • the server 101 after receiving the processing result message, the server 101 fills the processing result into the entry corresponding to the index in the data index table, and then returns the processing result to the electronic device 100.
  • conditional wait variable will set the value from "False” to "True” after the waiting timeout, and the server 101 will change the request thread After waking up, the server 101 returns processing failure information to the electronic device 100.
  • the server 101 may include a front-end interface module and a load maintenance module.
  • the front-end interface module is used to complete the following method steps:
  • S501 Receive a request from the electronic terminal 100.
  • step S502 Select the chip processor that executes the request according to the load value Load[] of each chip processor. For this step, refer to step S402, which will not be repeated here.
  • step S503 Generate an index corresponding to the request. For this step, please refer to step S403, which will not be repeated here.
  • S504 Save the index in the data index table, encapsulate the index and the request into a processing request message, and send it to the multi-core chip 102.
  • S505 Wait for the processing result update message sent by the load maintenance module, or wait for a timeout.
  • a conditional wait variable can be used to wait. For details, refer to step S404, which will not be repeated here.
  • the load maintenance module may send the processing result and index to the front-end interface module.
  • the front-end interface module if the front-end interface module obtains the processing result message from the load maintenance module, the front-end interface module returns the processing result in the processing result message to the electronic terminal 100. If the waiting variable waiting for the condition corresponding to the request times out, the front-end interface module returns a processing failure message to the electronic terminal 100.
  • the load maintenance module is used to maintain the data index table and complete the following method steps:
  • S601 Receive an index from the front-end interface module, and save the index in a corresponding index record in the data index table.
  • the load maintenance module saves the index in the index record of the chip processor corresponding to the processor serial number CoreID in the data index table.
  • the number of indexes of the chip processor increases accordingly.
  • the increase may specifically be that for each additional index in the index record, the corresponding index number is increased by one.
  • step S602 Receive the processing result message from the multi-core chip 102, query the data index table according to the index in the processing result message, and update the data index table. For this step, refer to step S406, which will not be repeated here.
  • the load maintenance module saves the processing result in the received processing result message to the entry corresponding to the index, and then executes step S603.
  • S603 Notify the front-end interface module that the processing is complete.
  • the load maintenance module can send the index and processing result in the received processing result message to the front-end interface module, or send the index and processing result stored in the data index table to the front-end interface module to indicate the corresponding The request has been processed.
  • the server 101 creates an index for the request when it receives a new request, and the index is included in the entire processing flow, so that the server 101 can indicate the allocated chip processor through the index, and the multi-core chip can compare the index according to the index.
  • the request is allocated to the corresponding chip processor, and the server 101 can maintain the load of each chip processor in the multi-core chip 102 through the index, which is further used for the selection of the chip processor, which improves the processing efficiency of the multi-core chip.
  • the request cannot be processed and the processing result cannot be returned to the server, then after a period of time, in the server, in the data index table corresponding to the abnormal chip processor There will be a large number of indexes corresponding to requests that have not returned results in the index record of. A higher number of indexes indicates that the load value of the chip processor is higher. It can be seen from steps S4021-S4023 that the chip processor with a high load value has a lower distribution probability. It can be seen that this selection method can allocate requests according to the load of the chip processor, avoiding the allocation of a large number of requests to abnormal chip processors.
  • the embodiment of the present invention provides a load maintenance method for the server 101 to maintain a data index table.
  • the main steps included in this method are shown in Figure 10:
  • the server 101 detects whether each chip processor included in the multi-core chip 102 has a processing result message returned within the first time period T1, and determines the chip processor that has no processing result message returned during the first time period T1 as an abnormal chip processor . Specifically, it includes the following steps: after receiving the processing result message from the multi-core chip 102, the server 101 obtains the processing result and corresponding index information by unpacking, and detects the processing in the index contained in the processing result message received within the first time length T1
  • the serial number of the processor is Core ID, so as to determine that there is no chip processor for which the processing result message is returned.
  • the first duration T1 may be set to be much greater than one service processing period (usually at the millisecond level or second level), specifically, the first duration T1 may be set to 30 seconds.
  • step S703 If the server 101 detects that a certain chip processor has a processing result message returned within the first time period T1, it indicates that the chip processor is in a normal state, and the server returns to step S701 to continue the detection. If the server 101 does not receive a processing result from a chip processor within the first time period T1, it indicates that the chip processor may be abnormal and cannot return the processing result to the server 101, then the server 101 will execute step S703;
  • S703 The server 101 triggers flow compensation to determine the state of the abnormal chip processor.
  • step S701 Although the server 101 did not receive the processing result message from the abnormal chip processor, the server 101 still processed the processing to each chip of the multi-core chip 102 during this time period according to the data index table maintained by it. Allocation request. For the abnormal chip processor, since there is no processing result returned, a larger number of indexes will be accumulated in the index records in the data index table, and the number of indexes will be higher. It can be seen from steps S4021-S4023 that the allocation probability of the abnormal chip processor will gradually become lower. When the allocation probability is too low, the server 101 cannot allocate a request to the abnormal chip processor. At this time, even if the abnormal chip processor returns to normal, the allocation probability may be too low to receive the request.
  • the server 101 may generate a test request for testing the state of the abnormal chip processor, and specify the processing by the abnormal chip processor to confirm the state of the abnormal chip processor.
  • the server 101 When the server 101 generates an index corresponding to the test request, it can designate the processor serial number CoreID as the abnormal chip processor, so that the test request is allocated to the abnormal chip processor.
  • the test request may be a processing task preset in the server 101, such as an image recognition request for a picture pre-stored in the server, or a processing task copied from the request currently received by the server 101 .
  • the server 101 obtains the thread number and timestamp of the test request, and generates an index in combination with the processor serial number Core ID of the specified abnormal chip processor, encapsulates the test request and the index, and sends the test request and the index to the multi-core chip 102 for processing.
  • the server 101 saves the index in the index record of the corresponding chip processor in the data index table, and the number of indexes increases accordingly.
  • the server 101 may also perform step S702 before performing step S703:
  • Step S702 When it is determined that the abnormal chip processor does not receive the request within the second time period T2, and when it is determined that the abnormal chip processor does not receive the processing result within the first time period T1 and does not receive the request within the second time period T2 Step S703 is executed. If the abnormal chip processor receives the request within the second time period T2, step S704 is executed. Specifically, the server 101 may determine whether the abnormal chip processor receives the request within the second time period T2 by detecting the processor serial number and the time stamp in the index recorded in the data index table. If the abnormal chip processor has received a request within the second time period T2, the server 101 can determine the state of the abnormal chip processor by detecting the processing result of the request.
  • the second duration T2 may be set to be close to one service processing period, such as 1 second.
  • the time period corresponding to the second time period may be a part of the time period corresponding to the first time period.
  • the server detects whether the abnormal chip processor is received within the last 1s of the first time period of 30s. Have been to the request.
  • the server 101 detects whether the processing result is received from the abnormal chip processor within the third time period T3. Specifically, the server 101 may detect whether the processing result is received from the abnormal chip processor by detecting the processor serial number Core ID in the index included in the received processing result message.
  • the third duration T3 can be measured from the server 101 performing traffic compensation in step S703 and the time stamp obtained when generating the index for the test request, or can be measured from the step S702 after it is determined that the request is received within the second duration T2. In some embodiments, the third duration T3 may be set to be much longer than one service period, for example, 30 seconds.
  • the server 101 may execute step S705.
  • the server 101 does not receive the processing result from the abnormal chip processor within the third time period T3, it indicates that the state of the abnormal chip processor is still in the abnormal state. At this time, the server 101 will return to step S701 for monitoring.
  • the server 101 maintains the high load and low allocation probability status of the abnormal chip processor in the data index table, so that newly received requests are allocated to the abnormal chip processor with a low probability.
  • the server 101 if the server 101 does not receive the processing result from the abnormal chip processor within the third time period T3, it indicates that the state of the abnormal chip processor is still in the abnormal state, and the server 101 jumps to S703 to determine The state of the abnormal chip processor.
  • Step S705 Remove the index accumulated in the index record of the chip processor in the data index table.
  • the server 101 will return to step S701 for monitoring.
  • the server 101 receives the processing result from the abnormal chip processor within the third time period T3, indicating that the abnormal chip processor has returned to normal.
  • the chip processor also stores a large number of indexes that have not been processed or failed to be processed during the abnormal state, and at the same time, the number of indexes is relatively high. It can be seen from steps S4021-S4023 that the chip processor with a higher index number corresponds to a lower allocation probability. Therefore, removing the index accumulated in the index record of the chip processor in the data index table can increase the probability that the chip processor obtains the request and improve the processing efficiency of the multi-core chip.
  • the server 101 clears the number of indexes.
  • the server 101 can obtain the change of the chip processor state, and can adaptively process the index records and the number of indexes in the data index table according to the state change, so that the processing efficiency of the multi-core chip 102 is improved.
  • the embodiment of the present invention provides another load maintenance method.
  • the data structure of the data index table maintained in the server 101 may be as shown in Table 2.
  • the data index table includes a plurality of arrays, and the number of the arrays corresponds to the number of chip processors included in the multi-core chip.
  • the data index table may include 4 arrays, which correspond to the chip processors 1021-1024, respectively.
  • the i-th array corresponds to the i-th chip processor in the chip, and is used to maintain the load condition of the chip processor.
  • the i-th array is named after the processor serial number Core i of the i-th chip processor, and includes two data units, namely the new data unit New[] and the old data unit Old[], and the new data unit New[] It has the same data structure as the old data unit Old[], and the data structure may include the index record Index Record corresponding to the chip processor.
  • the index record Index Record stores the index related to the processing request processed by the corresponding chip processor.
  • the index generated by the server 101 for the newly received request is stored in the index record Index Record of the new data unit New[].
  • the new data unit New[] and the old data unit Old[] exchange data, that is, the index record of the new data unit New[] and the index record of the old data unit Old[]. exchange.
  • the initial state of the index record Index can be empty.
  • the data structure of the new data unit New[] and the old data unit Old[] may further include the number of indexes Num_Index.
  • the number of indexes Num_Index is the number of indexes stored in the corresponding index record IndexRecord. Its initial value can be 0.
  • the index record Index Record of the new data unit stores N indexes
  • the number of indexes Num_Index is N
  • the index record Index Record of the old data unit stores For M indexes
  • the number of indexes Num_Index is M.
  • the sum of the index numbers of the new data unit and the old data unit in a chip processor may be considered as the load value of the chip processor.
  • Table 2 only exemplarily shows the structure of the data index table, and the specific names and storage methods of each field are not limited in the present invention.
  • the server 101 may record the time T for data exchange between the new data unit and the old data unit, and compare the timestamp in the index with the time T, and if the timestamp of the index is greater than the time T, the index A new data unit stored in the corresponding chip processor.
  • the new data unit and the old data unit of all chip processors exchange data at the same time.
  • the new data unit and the old data unit of each chip processor can exchange data at different times, and the server can record the time of data exchange between the new data unit and the old data unit of each chip processor.
  • the data exchange period of each chip processor may be different.
  • the method for invoking the multi-core chip 102 by the server 101 and the method for the server 101 to maintain the load status of the multi-core chip 102 will be described below in conjunction with the processing flow of the request. This method is based on the data index table shown in Table 2.
  • S801 The electronic terminal 100 sends a request to the server 101;
  • the server 101 receives the request, and selects the chip processor that executes the request according to the load value Load[i] of each chip processor.
  • the load value Load[i] of the chip processor is the sum of the index number of the new data unit New[] of the chip processor and the index number of the old data unit Old[] in the data index table.
  • S803 The server 101 generates an index according to the processor serial number Core ID allocated to the request, the obtained thread number of the request, and the timestamp.
  • the server 101 saves the index in the data index table, encapsulates the index and the request into a processing request message, and sends the processing request message to the multi-core chip 102 for processing.
  • the server 101 saves the index in the data index table, which specifically includes the following steps:
  • the server 101 saves the index in the index record Index Record of the new data unit New[] of the chip processor corresponding to the processor serial number Core ID in the data index table.
  • the index number Num_Index of the new data unit New[] increases accordingly.
  • the increase may specifically be that each time an index is added in the index record Index Record, the corresponding index number Num_Index increases by one.
  • the server 101 uses the conditional wait variable to manage the request thread, please refer to step S404, which will not be repeated here.
  • the server 101 receives a processing result message from the multi-core chip 102, and the processing result message includes the encapsulated processing result and index.
  • S806 The server 101 maintains a data index table according to the index in the processing result message.
  • the processing result message received by the server 101 from the multi-core chip 102 includes an index, and the new data unit New[] and the chip processor corresponding to the processor ID Core ID in the data index table are indexed through the index.
  • the index record Index Record of the old data unit Old[] searches for the corresponding index, and removes the index from the index record Index Record.
  • the number of indexes Num_Index corresponding to the data unit is correspondingly reduced. In some cases, the reduction may specifically be: each time an index is removed from the Index Record, the index number Num_Index of the corresponding data unit is reduced by one.
  • S807 The server 101 sends the processing result to the electronic device 100.
  • step S407 the process of the server 101 using the condition wait variable to perform thread management according to the processing result can refer to step S407, which will not be repeated here.
  • the server creates an index for the request when it receives a new request, and includes the index in the entire processing flow, so that the server can indicate the allocated chips through the index, and the server can perform chip load calculation through the index. Maintenance in order to balance the load.
  • the embodiment of the present invention provides another load maintenance method.
  • the load maintenance method is based on the data index table provided in Table 2 and is used to maintain the load when an abnormal chip processor occurs. As shown in FIG. 11, the method is specifically Including the following steps:
  • the server 101 detects whether each chip processor included in the multi-core chip 102 has a processing result message returned in the fourth time period T4, and determines the abnormal chip processor for the chip processor that does not return the processing result message in the fourth time period T4. For the specific operation of this step, refer to step S701, which will not be repeated here.
  • the fourth time period T4 may start timing after the new data unit and the old data unit of the chip processor in the data index table exchange data.
  • the fourth duration T4 can be set to be much longer than one service processing period (usually in the millisecond level or in the second level), specifically, it can be set to 30 seconds.
  • step S905 If the server 101 detects that a certain chip processor has a processing result message returned within the fourth time period T4, it indicates that the chip processor is in a normal state.
  • the server may execute step S905; in some other embodiments, if the server 101 detects that a processing result message of a certain chip processor is returned within the fourth time period T4, it may return to step S901 for the next detection.
  • the server 101 If the server 101 does not receive a processing result from a chip processor within the fourth time period T4, it indicates that the chip processor may be abnormal and cannot return the processing result to the server 101, then the server 101 will execute step S903;
  • step S903 The server 101 triggers flow compensation to determine the state of the abnormal chip processor. Then step S904 is executed. For the specific operation of this step, refer to step S703, which will not be repeated here.
  • the server 101 saves the index generated for the test request in the index record of the new data unit of the corresponding chip processor in the data index table, and the number of indexes of the new data unit can be increased accordingly.
  • the server 101 before performing step S903, the server 101 further includes step S902:
  • step S902 When it is determined that the abnormal chip processor does not receive the request within the fifth time period T5, and when it is determined that the abnormal chip processor does not receive the processing result within the fourth time period T4 and does not receive the request within the fifth time period T5 , The server 101 executes step S903. When the abnormal chip processor receives the request within the fifth time period T5, the server 101 may execute step S904. For the specific operation of this step, refer to step S702, which will not be repeated here.
  • step S904 The server 101 detects whether the processing result is received from the abnormal chip processor within the sixth time period T6. For the specific operation of this step, please refer to step S704, which will not be repeated here. If the server 101 receives a processing result from the abnormal chip processor within the sixth time period T6, it indicates that the state of the abnormal chip processor has returned to normal, and the request can be processed normally. The server 101 may perform step S905; if the server 101 does not receive the processing result from the abnormal chip processor within the sixth time period T6, it indicates that the state of the abnormal chip processor is still in the abnormal state. The server 101 may execute step S906.
  • the aforementioned sixth duration T6 can be set to a value much greater than one business cycle, such as 30 seconds, or can be set to start from the time stamp obtained when the server indexes the test request in step S903 to the next new data unit and old data. The length of time the unit is turned over.
  • the server 101 may remove the index accumulated in the index record of the old data unit of the chip processor in the data index table, and correspondingly replace the old data unit The number of indexes is cleared.
  • the server 101 executes step S907.
  • the server 101 detects that a chip processor has a processing result returned within the fourth time period T4 in step S901, it indicates that the chip processor is in a normal working state.
  • the new data unit and the old data unit exchange data, because The time period for exchanging data is much longer than the service processing period. Therefore, the requests corresponding to the index stored in the index record of the old data unit should be all processed. Under normal circumstances, the index record of the old data unit should be empty, and the corresponding index number should be 0.
  • the chip processor may have occasional or sudden exceptions in the process of processing the request, it will return to normal after the exception.
  • the number of indexes in the data index table will affect the distribution probability of the chip processor. Therefore, when the chip processor is in normal state, clearing the index record of the old data unit can improve the chip processor receiving The probability of request is conducive to making full use of the processing power of multi-core chips and improving processing efficiency.
  • the server 101 may first determine whether the index number of the old data unit is 0 and/or whether the index record is empty. If the index number is not 0 and/or the index record is not empty, the next time the chip Before the processor exchanges data between the new data unit and the old data unit, the server 101 clears the index record of the old data unit.
  • the server 101 may clear the index number of the old data unit to zero.
  • step S904 If the server 101 receives the processing result within the sixth time period T6 in step S904, it indicates that the abnormal chip processor has recovered from the abnormal state, and the index corresponding to the request that has timed out needs to be removed, and the data index table is removed All the index records saved in the index record of the old data unit of the chip processor reduce the load value of the chip processor and increase the probability of the chip processor receiving a new request. At the same time, the old data unit after the exchange saves the index of the original new data unit, so that when the server 101 receives the related processing result and index, it can update the data index table accordingly.
  • the server 101 Before the next data exchange between the new data unit of the chip processor and the old data unit, the server 101 saves the index of the old data unit of the chip processor in the data index table in the new data unit, and then clears the index record of the old data unit .
  • the server 101 executes step S907.
  • the server 101 may add the index number of the new data unit to the index number of the old data unit as the index number of the new data unit, and then clear the index number of the old data unit to zero.
  • the server 101 does not receive the processing result within the sixth time period T6 in step S904, it means that the chip processor is still in an abnormal state, and the index and the index number value of the new data unit of the abnormal chip processor are Added to the new data unit, so that when the new data unit and the old data unit exchange data, the chip processor in the exchanged data index table can continue to maintain a high load state, and the probability of new requests being allocated to the abnormal chip processor remains Lower state.
  • S907 Exchange the index records and index numbers of the new data unit and the old data unit of the chip processor in the data index table.
  • the server 101 may return to step S901 for monitoring.
  • the server 101 if the server 101 does not receive the processing result from the abnormal chip processor within the sixth time period T3, it indicates that the state of the abnormal chip processor is still in the abnormal state, and the server 101 can log in to the new data unit and After the old data unit exchanges data, it jumps to step S903 to determine the state of the abnormal chip processor.
  • the server can obtain the change of the chip processor state, and can adaptively process the data maintained in the data index table according to the state change, and can more accurately process the index records and the number of indexes in the data unit , Which improves the processing efficiency of multi-core chips.
  • the embodiment of the present invention also provides a task processing method. As shown in Figure 12, the method mainly includes the following steps:
  • the controller 1031-1034 receives a processing request message from the server 101, and the processing request message includes a request and a corresponding index.
  • controller 1031-1034 receives from the server 101 is a processing request message allocated to the chip processor corresponding to the controller 1031-1034.
  • the controller 1031-1034 decapsulates the processing request message to obtain the request and the corresponding index.
  • the controller 1031-1034 caches one or more requests and corresponding indexes
  • step S1004 The controller 1031-1034 obtains the request from the cached request and sends the request to the chip processor. For the specific operation of this step, please refer to step S203, which will not be repeated here.
  • the controller 1031-1034 when merging requests, obtains one or more requests and corresponding indexes from the cached requests, and sends the requests or merged requests and corresponding indexes to the corresponding chips. processor.
  • the controller 1031-1034 receives the processing result from the chip processor, encapsulates the processing result and the corresponding index to obtain the processing result message, and sends the processing result message to the server 101.
  • the processing results received by the controller 1031-1034 from the chip processor include processing results corresponding to the combined multiple requests.
  • the controllers 1031-1034 can encapsulate the obtained multiple processing results with corresponding indexes in order to obtain multiple processing result messages.
  • the controller 1031-1034 may receive the packaged processing result message from the chip processor, and send the processing result message to the server 101.
  • the embodiment of the present invention also provides another task processing method. As shown in Figure 13, the method mainly includes the following steps:
  • the controller 106 receives the encapsulated processing request message from the server 101, where the processing request message includes the request and the corresponding index.
  • S1102 The controller 106 decapsulates the processing request message to obtain the request and the corresponding index.
  • the controller 106 obtains the processor serial number Core ID from the index, and sends the request and the corresponding index to the controller 1031-1034 connected to the corresponding chip processor.
  • S1104 The controller 1031-1034 caches one or more requests and corresponding indexes
  • step S1105 The controller 1031-1034 obtains the request from the cache and sends the request to the chip processor. For the specific operation of this step, please refer to step S1004, which will not be repeated here.
  • the controller 1031-1034 receives the processing result from the chip processor, encapsulates the processing result and the corresponding index to obtain the processing result message, and sends the processing result message to the controller 106.
  • the processing results received by the controller 1031-1034 from the chip processor include processing results corresponding to the combined multiple requests.
  • the controllers 1031-1034 may encapsulate the obtained multiple processing results with corresponding indexes in order to obtain multiple processing result messages.
  • the controller 1031-1034 receives the packaged processing result message from the chip processor, and sends the processing result message to the controller 106.
  • S1107 The controller 106 sends the processing result message to the server 101.
  • the controller 1031-1034 may directly send the encapsulated processing result message to the server 101. In some other embodiments, the controllers 1031-1034 may also send the processing result and the corresponding index to the controller 106, and the controller 106 encapsulates the processing result message, and sends the processing result message to the server 101.
  • each step of the above method can be completed by an integrated logic circuit of hardware in the processor or instructions in the form of software.
  • the steps of the method disclosed in combination with the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware. To avoid repetition, it will not be described in detail here.
  • the size of the sequence number of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not correspond to the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer And Data Communications (AREA)
  • Hardware Redundancy (AREA)

Abstract

本发明提供了一种多核芯片以及业务处理的方法和系统。多核芯片为每个处理器配置一个控制器,使多核芯片能够缓存请求以及将多个请求进行合并处理。此外,服务器在请求的处理流程中均携带索引,并根据索引进行数据索引表的维护,根据数据索引表来选择处理请求的处理器。本发明能够实现处理器的负载均衡,并能够确定多核芯片中包括的多个处理器的状态,并根据每个处理器的状态适应性的调整数据索引表中的数据,提高了多核芯片的处理效率。

Description

一种多核芯片及其调度方法
本申请要求在2020年2月21日提交中国国家知识产权局、申请号为202010108531.8的中国专利申请的优先权,发明名称为“一种多核芯片及其调度方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,尤其涉及一种多核芯片及其调度方法。
背景技术
目前,在游戏、高清视频播放或图像编辑等各种终端上的业务需要进行大量的图像处理或语音处理。随着终端技术的发展,很多电子终端都配置了专门处理上述业务的处理器,如图形处理器(GPU),用于支持图像处理业务。然而,相对于服务器的算力,电子终端的处理能力不能满足用户的需求。因此,很多业务都需要电子终端调用服务器,或者电子终端调用服务器,服务器通过调用芯片来进行数据处理。
然而,由于现有的多核芯片自身的调度能力较弱。现有的调度方式无法充分利用芯片的处理能力,并且不能满足并发场景,导致处理的效率较低。
发明内容
本发明实施例所要解决的技术问题在于,提供一种装置以及业务调度方法和系统,以提高业务处理的效率。
上述目标和其他目标将通过独立权利要求中的特征来达成。进一步的实现方式在从属权利要求、说明书和附图中体现。
第一方面,本申请的实施例提供了一种装置,其包含,第一处理器和第二处理器;第一控制器和第二控制器,第一控制器与第一处理器相连接,第二控制器与第二处理器相连接;第一控制器用于:存储第一请求和第二请求,第一请求和第二请求是分配给第一处理器的请求;将第一请求发送给第一处理器进行处理;从第一处理器接收第一处理消息;将第二请求发送给第一处理器进行处理;第二控制器用于:存储第三请求和第四请求,第三请求和第四请求是分配给第二处理器的请求;将第三请求发送给第二处理器进行处理;从第二处理器接收第二处理消息;将第四请求发送给第二处理器进行处理。
通过上述实现方式,该装置能够减少处理器的空闲时间,提高了处理器的业务处理效率。
在一种可能的实现方式中,第一控制器存储第五请求和第六请求,该第一控制器还用于,将第五请求和第六请求合并为第七请求,将第七请求发送给第一处理器进行处理。
通过上述实现方式,处理器能够将多个请求合并为一个进行处理,进一步提 高了处理器的业务处理效率。
在一种可能的实现方式中,第一控制器从第一装置接收第一消息,第一消息中包含第一请求以及第一信息;以及第一信息与第一处理器相对应。
通过上述实现方式,根据第一信息进行请求向处理器的分配。
在一种可能的实现方式中,上述第一处理消息包含第一处理结果,该第一处理结果是第一处理器对第一请求进行处理的结果;第一控制器向第一装置发送第二消息,该第二消息中包含第一处理结果和第一信息。
通过上述实现方式,第一装置能够根据第一信息将请求和处理结果进行匹配。
在一种可能的实现方式中,上述第一处理消息是第三消息,该第三消息中包含第一处理结果和第一信息,第一控制器向第一装置发送第三消息。
通过上述实现方式,控制器只需转发处理结果消息。
在一种可能的实现方式中,第一控制器向第一处理器发送第一请求时,一并发送第一信息;上述第一处理消息是第一处理结束消息,该第一处理结束消息指示第一处理器对第一请求处理结束;第一处理器向第一装置发送第四消息,该第四消息中包含第一处理结果和第一信息。
通过上述实现方式,处理器在处理结束时能够直接将处理结果返回给第一装置。
在一种可能的实现方式中,该装置还包含第三控制器,该第三控制器分别与第一控制器和第二控制器相连接;第三控制器用于从第一装置接收第一请求、第二请求、第三请求和第四请求,以及,向第一控制器发送第一请求和第二请求,向第二控制器发送第三请求和第四请求。
通过上述实现方式,该装置能够进行请求的分配,简化了第一装置与该装置之间的连接结构,减轻了第一装置的负担。
在一种可能的实现方式中,第三控制器从第一装置接收第五消息,第五消息中包含第一请求以及第一信息,第一信息与第一处理器相对应;第三控制器根据第一信息将第一请求和第一信息发送给第一控制器。
通过上述实现方式,该装置能够根据第一装置的指示进行请求的分配,使得第一装置能够对各处理器进行有效的管理。
在一种可能的实现方式中,上述第一处理消息包含第一处理结果;第三控制器还用于从第一控制器接收第一处理结果和第一信息,第三控制器向第一装置发送第六消息,该第六消息中包含第一处理结果和第一信息。
通过上述实现方式,第一装置能够统一从第三控制器接收处理结果,简化了第一装置与该装置之间的连接结构。
在一种可能的实现方式中,上述第一处理消息包含第一处理结果;第三控制器还用于从第一控制器接收第七消息,该第七消息包含第一处理结果和第一信息,第三控制器向电子设备发送第七消息。
通过上述实现方式,由控制器进行处理结果消息的生成。
在一种可能的实现方式中,上述第一处理消息是第八消息,该第八消息中包 含第一处理结果和第一信息;第三控制器还用于从第一控制器接收第八消息,第三控制器向第一装置发送第八消息。
通过上述实现方式,由处理器进行处理结果消息的生成。
在一种可能的实现方式中,第一信息中包含第一处理器序号、时间戳和线程号;其中,第一处理器序号是第一装置为第一请求分配的,时间戳和线程号与第一请求相对应。
通过上述实现方式,第一装置能够对请求的分配以及处理过程进行有效的管理。
可以理解的是,上述装置可以为多核芯片;上述第一处理器、第二处理器可以为芯片处理器;上述第一装置可以为服务器或者服务器中的处理器,即CPU;上述第一消息、第五消息可以为处理请求消息;上述第二消息、第三消息、第四消息、第六消息、第七消息、第八消息可以为处理结果消息。
第二方面,本申请实施例提供了一种业务处理方法,其包括:第一装置向第二装置发送第一消息,第二装置包含第一处理器和第二处理器;第一消息中包含第一请求和第一信息,第一信息与第一处理器相对应;第二装置根据第一信息,将第一请求发送给第一处理器进行处理;第二装置向第一装置发送第二消息,第二消息中包含第一处理结果和第一信息,其中,第一处理结果是第一处理器对第一请求处理之后的处理结果;其中,第一信息是第一装置接收第一请求并确定处理第一请求的处理器后生成的;第一信息中包含第一处理器序号、时间戳和线程号;其中,第一处理器序号是第一装置为第一请求分配的,时间戳和线程号与第一请求相对应。该第一装置具体可以为服务器。该第二装置可以为多核芯片。
通过上述实现方式,第二装置能够根据第一装置发送的第一信息进行处理器的分配。
在一种可能的实现方式中,第一装置存储第一表格,第一表格与第二装置相对应,其中,第一表格包括第一数组与第二数组,第一数组对应于第一处理器,第二数组对应于第二处理器;第一数组包括第一数据和第二数据,第一数据包括第一记录,第一记录用于保存一条或多条信息,该信息的数量为第一数量;第二数据包括第二记录,第二记录用于保存一条或多条信息,该信息的数量为第二数量;第二数组包括第三数据和第四数据,第三数据包括第三记录,第三记录用于保存一条或多条信息,该信息的数量为第三数量;第四数据包括第四记录,第四记录用于保存一条或多条信息,该信息的数量为第四数量;其中,第一装置为第一请求生成第一信息后,将第一信息保存在第一记录中;第一数据和第二数据每隔一定时长相互交换数据;第一装置从第二装置接收第二消息后,根据第二消息中的第一信息,在第一记录和/或第二记录中查询;若在第一记录中查询到第一信息,从第一记录中移除第一信息;若在第二记录中查询到第一信息,从第二记录中移除第一信息。
通过上述实现方式,第一装置能够获得第二装置的各处理器处理请求的情况。
在一种可能的实现方式中,若第一装置检测到在第一时长内没有从第一处理器接收到处理结果,第一装置生成第一测试请求以及与第一测试请求对应的信息,第一测试请求对应的信息与第一处理器相对应;第一装置检测在第三时长内是否从第一处理器接收到处理结果,若收到处理结果,在下次第一数据和第二数据交换数据之前,将第二数据的第二记录清空;若没收到处理结果,在下次第一数据和第二数据数据交换之前,将第二数据的第二记录中的一条或多条信息保存到第一数据的第一记录中,将第二记录清空。
通过上述实现方式,第一装置能够获得第二装置各处理器状态的变化并适应性的调整维护的数值,能够提高处理效率。
在一种可能的实现方式中,第一装置在生成第一测试请求之前,还包括,第一装置检测第一处理器在第二时长内是否收到请求,若服务器确定第一处理器在第二时长内没有收到请求,生成第一测试请求。
通过上述实现方式,第一装置能够在确定异常处理器没有收到请求后进行流量补偿。
在一种可能的实现方式中,若第一装置确定在第三时长内没有从第一处理器接收到处理结果,在第一数据和第二数据交换数据后,第一装置生成第二测试请求。
通过上述实现方式,第一装置能够在确定处理器处于异常状态之后可以继续检测处理器的状态变化。
在一种可能的实现方式中,第一装置根据第一表格,确定处理第一请求的处理器,具体包括:根据第一数量和第二数量之和以及第三数量和第四数量之和确定第一能力和第二能力,第一能力表示第一处理器处理新请求的能力,第二能力表示第二处理器处理新请求的能力;根据第一能力和第二能力确定第一分配概率和第二分配概率,确定第一概率空间和第二概率空间;第一分配概率和第一概率空间对应于第一处理器,第二分配概率和第二概率空间对应于第二处理器;取随机数,根据随机数确定分配的处理器。
通过上述实现方式,第一装置能够根据各处理器处理请求的情况来分配处理器,能够实现负载均衡,提高处理效率。
在一种可能的实现方式中,第一装置根据第一数量和第二数量之和、第三数量和第四数量之和以及第一处理器和第二处理器的处理速度确定第一能力和第二能力。
通过上述实现方式,第一装置能够更加准确的确定各处理器的新请求处理能力。
在一种可能的实现方式中,第一装置存储第一表格,其中,第一表格包括第五数组与第六数组,第五数组对应于第一处理器,第六数据对应于第二处理器;第五数组包括第五记录,第五记录用于保存一条或多条信息,该信息的数量为第五数量;第六数组包括第六记录,第六记录用于保存一条或多条信息,该信息的数量为第六数量;其中,第一装置为第一请求生成第一信息后,将第一信息保存 在第五记录中;第一装置从第二装置接收第二消息后,根据第二消息中的第一信息,在第五记录中查询;若在第五记录中查询到第一信息,从第五记录中移除第一信息。
通过上述实现方式,第一装置能够获得第二装置的各处理器处理请求的情况。
在一种可能的实现方式中,若第一装置检测到在第四时长内没有从第一处理器接收到处理结果,第一装置生成第三测试请求以及与第三测试请求对应的信息,该第三测试请求对应的信息与第一处理器相对应;第一装置检测在第五时长内是否从第一处理器接收到处理结果,若收到处理结果,将第五记录清空。
通过上述实现方式,第一装置能够获得第二装置各处理器状态的变化并适应性的调整维护的数值,能够提高处理效率。
在一种可能的实现方式中,第一装置根据第一表格,确定处理第一请求的处理器,具体包括:根据第五数量以及第六数量确定第三能力和第四能力,第三能力表示第一处理器处理新请求的能力,第四能力表示第二处理器处理新请求的能力;根据第三能力和第四能力确定第三分配概率和第四分配概率,确定第三概率空间和第四概率空间;第三分配概率和第三概率空间对应于第一处理器,第四分配概率和第四概率空间对应于第二处理器;取随机数,根据随机数确定分配的处理器。
通过上述实现方式,第一装置能够根据各处理器处理请求的情况来分配处理器,能够实现负载均衡,提高处理效率。
在一种可能的实现方式中,第一装置根据第五数量、第六数量以及第一处理器和第二处理器的处理速度确定第三能力和第四能力。
通过上述实现方式,第一装置能够更加准确的确定各处理器的新请求处理能力。
可以理解的是,上述第一表格可以为数据索引表;上述第一数据、第三数据可以为新数据单元;上述第二数据、第四数据可以为旧数据单元;上述第一信息可以为索引;上述第一记录、第二记录、第三记录、第四记录、第五记录、第六记录可以为索引记录;上述第一数量、第二数量、第三数量、第四数量、第五数量、第六数量可以为索引数量;上述第一能力、第二能力、第三能力、第四能力可以为新请求处理能力。
第三方面,本申请实施例提供了一种计算机可读介质,其包括,用于存储一个或多个程序,其中所述一个或多个程序被配置为被所述一个或多个处理器执行,所述一个或多个程序包括指令,所述指令用于执行如第二方面中任一可能的实现方式。
第四方面,本申请实施例提供了一种业务处理系统,其包括:第一装置;第二装置,第二装置包含第一处理器和第二处理器;第一装置向第二装置发送第一消息,第一消息中包含第一请求和第一信息,其中第一信息与第一处理器相对应; 第二装置根据第一信息,将第一请求发送给第一处理器进行处理;第二装置向第一装置发送第二消息,第二消息中包含第一处理结果和第一信息,其中,第一处理结果是第一处理器对第一请求处理之后的处理结果;其中,第一信息是第一装置接收第一请求并确定处理第一请求的处理器后生成的;第一信息中包含第一处理器序号、时间戳和线程号;其中,第一处理器序号是第一装置为第一请求分配的,时间戳和线程号与第一请求相对应。该第一装置可以是服务器,该第二装置可以是多核芯片。
通过上述实现方式,第二装置能够根据第一装置发送的第一信息进行处理器的分配。
在一种可能的实现方式中,第一装置存储第一表格,第一表格与第二装置相对应,其中,第一表格包括第一数组与第二数组,第一数组对应于第一处理器,第二数组对应于第二处理器;第一数组包括第一数据和第二数据,第一数据包括第一记录,第一记录用于保存一条或多条信息,该信息的数量为第一数量;第二数据包括第二记录,第二记录用于保存一条或多条信息,该信息的数量为第二数量;第二数组包括第三数据和第四数据,第三数据包括第三记录,第三记录用于保存一条或多条信息,该信息的数量为第三数量;第四数据包括第四记录,第四记录用于保存一条或多条信息,该信息的数量为第四数量;其中,第一装置为第一请求生成第一信息后,将第一信息保存在第一记录中;第一数据和第二数据每隔一定时长相互交换数据;第一装置从第二装置接收第二消息后,根据第二消息中的第一信息,在第一记录和/或第二记录中查询;若在第一记录中查询到第一信息,从第一记录中移除第一信息;若在第二记录中查询到第一信息,从第二记录中移除第一信息。
通过上述实现方式,第一装置能够获得第二装置的各处理器处理请求的情况。
在一种可能的实现方式中,若第一装置检测到在第一时长内没有从第一处理器接收到处理结果,第一装置生成第一测试请求以及与第一测试请求对应的信息,第一测试请求对应的信息与第一处理器相对应;第一装置检测在第三时长内是否从第一处理器接收到处理结果,若收到处理结果,在下次第一数据和第二数据交换数据之前,将第二数据的第二记录清空;若没收到处理结果,在下次第一数据和第二数据数据交换之前,将第二数据的第二记录中的一条或多条信息保存到第一数据的第一记录中,将第二记录清空。
通过上述实现方式,第一装置能够获得第二装置各处理器状态的变化并适应性的调整维护的数值,能够提高处理效率。
在一种可能的实现方式中,第一装置在生成第一测试请求之前,还包括,第一装置检测第一处理器在第二时长内是否收到请求,若服务器确定第一处理器在第二时长内没有收到请求,生成第一测试请求。
通过上述实现方式,第一装置能够在确定异常处理器没有收到请求后进行流量补偿。
在一种可能的实现方式中,若第一装置确定在第三时长内没有从第一处理器 接收到处理结果,在第一数据和第二数据交换数据后,第一装置生成第二测试请求。
通过上述实现方式,第一装置能够在确定处理器处于异常状态之后可以继续检测处理器的状态变化。
在一种可能的实现方式中,第一装置根据第一表格,确定处理第一请求的处理器,具体包括:根据第一数量和第二数量之和以及第三数量和第四数量之和确定第一能力和第二能力,第一能力表示第一处理器处理新请求的能力,第二能力表示第二处理器处理新请求的能力;根据第一能力和第二能力确定第一分配概率和第二分配概率,确定第一概率空间和第二概率空间;第一分配概率和第一概率空间对应于第一处理器,第二分配概率和第二概率空间对应于第二处理器;取随机数,根据随机数确定分配的处理器。
通过上述实现方式,第一装置能够根据各处理器处理请求的情况来分配处理器,能够实现负载均衡,提高处理效率。
在一种可能的实现方式中,第一装置根据第一数量和第二数量之和、第三数量和第四数量之和以及第一处理器和第二处理器的处理速度确定第一能力和第二能力。
通过上述实现方式,第一装置能够更加准确的确定各处理器的新请求处理能力。
在一种可能的实现方式中,第一装置存储第一表格,其中,第一表格包括第五数组与第六数组,第五数组对应于第一处理器,第六数据对应于第二处理器;第五数组包括第五记录,第五记录用于保存一条或多条信息,该信息的数量为第五数量;第六数组包括第六记录,第六记录用于保存一条或多条信息,该信息的数量为第六数量;其中,第一装置为第一请求生成第一信息后,将第一信息保存在第五记录中;第一装置从第二装置接收第二消息后,根据第二消息中的第一信息,在第五记录中查询;若在第五记录中查询到第一信息,从第五记录中移除第一信息。
通过上述实现方式,第一装置能够获得第二装置的各处理器处理请求的情况。
在一种可能的实现方式中,若第一装置检测到在第二时长内没有从第一处理器接收到处理结果,第一装置生成第三测试请求以及与第三测试请求对应的信息,该第三测试请求对应的信息与第一处理器相对应;第一装置检测在第三时长内是否从第一处理器接收到处理结果,若收到处理结果,将第五记录清空。
通过上述实现方式,第一装置能够获得第二装置各处理器状态的变化并适应性的调整维护的数值,能够提高处理效率。
在一种可能的实现方式中,第一装置根据第一表格,确定处理第一请求的处理器,具体包括:根据第五数量以及第六数量确定第三能力和第四能力,第三能力表示第一处理器处理新请求的能力,第四能力表示第二处理器处理新请求的能力;根据第三能力和第四能力确定第三分配概率和第四分配概率,确定第三概率空间和第四概率空间;第三分配概率和第三概率空间对应于第一处理器,第四分 配概率和第四概率空间对应于第二处理器;取随机数,根据随机数确定分配的处理器。
通过上述实现方式,第一装置能够根据各处理器处理请求的情况来分配处理器,能够实现负载均衡,提高处理效率。
在一种可能的实现方式中,第一装置根据第五数量、第六数量以及第一处理器和第二处理器的处理速度确定第三能力和第四能力。
通过上述实现方式,第一装置能够更加准确的确定各处理器的新请求处理能力。
应当理解的是,说明书中对技术特征、技术方案、优点或类似语言的描述并不是暗示在任意的单个实施例中可以实现所有的特点和优点。相反,可以理解的是对于特征或优点的描述意味着在至少一个实施例中包括特定的技术特征、技术方案或优点。因此,本说明书中对于技术特征、技术方案或优点的描述并不一定是指相同的实施例。进而,还可以任何适当的方式组合以下各个实施例中所描述的技术特征、技术方案和优点。本领域技术人员将会理解,无需特定实施例的一个或多个特定的技术特征、技术方案或优点即可实现实施例。在其他实施例中,还可在没有体现所有实施例的特定实施例中识别出额外的技术特征和优点。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对本申请实施例中所需要使用的附图进行说明。
图1A和图1B为本发明实施例提供的一种业务处理系统的架构示意图;
图2为本发明实施例提供的一种电子设备的结构示意图;
图3为本发明实施例提供的一种服务器调用多核芯片的示意图;
图4为本发明实施例提供的另一种服务器调用多核芯片的示意图;
图5为本发明实施例提供的又一种服务器调用多核芯片的示意图;
图6A-6E为本发明实施例提供的多核芯片的结构示意图;
图7A-7B为本发明实施例提供的多核芯片的处理流程示意图;
图8为本发明实施例提供的一种请求处理流程示意图;
图9为本发明实施例提供的一种芯片处理器选择方法流程示意图;
图10为本发明实施例提供的一种负载维护方法流程示意图;
图11为本发明实施例提供的另一种负载维护方法流程示意图;
图12为本发明实施例提供的另一种多核芯片的处理流程示意图;
图13为本发明实施例提供的又一种多核芯片的处理流程示意图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
如本申请所使用的,术语“组件”、“模块”、“系统”等等旨在指代计算机相关实体,该计算机相关实体可以是硬件、固件、硬件和软件的结合、软件或者运行中的软件。例如,组件可以是,但不限于是:在处理器上运行的处理、处理器、对象、可执行文件、执行中的线程、程序和/或计算机。作为示例,在计算设备上运行的应用和该计算设备都可以是组件。一个或多个组件可以存在于执行中的过程和/或线程中,并且组件可以位于一个计算机中以及/或者分布在两个或更多个计算机之间。此外,这些组件能够从在其上具有各种数据结构的各种计算机可读介质中执行。这些组件可以通过诸如根据具有一个或多个数据分组(例如,来自一个组件的数据,该组件与本地系统、分布式系统中的另一个组件进行交互和/或以信号的方式通过诸如互联网之类的网络与其它系统进行交互)的信号,以本地和/或远程过程的方式进行通信。
图1A和图1B示例性示出了本申请实施例涉及的一种业务处理系统的架构,该业务处理系统用于处理电子设备发起的业务。该业务包括但不限于图像识别、图像分类等图像处理业务以及语音识别、语音合成等语音处理业务。该业务处理系统主要包括一个或多个电子设备100和服务器101。一个或多个电子设备100与服务器101通信,发起有关各种图像处理业务和/或语音处理业务的请求,该请求可以是HTTP消息。所述服务器可以通过局域网或广域网与电子设备通信。服务器101可以通过软硬件接口调用多核芯片102完成电子设备100发起的请求。其中,硬件接口可以包括PCI、PCIe或USB等类型的接口,软件接口可以包括应用程序接口(API),例如使用软件封装的API接口SRC(Source)模块和DST(Destination)模块,其中,SRC模块是用于从服务器101向芯片102发送数据的应用程序接口模块,DST模块用于服务器101从芯片102接收数据的应用程序接口模块。多核芯片102完成了请求的处理之后,将处理结果发送至服务器101,服务器101将处理结果发送至一个或多个电子设备100。
举例说明,电子设备和服务器执行图像处理和/或语音业务的过程可以包括以下方式:
用户1打开手机上安装的应用程序如“图库”,该应用程序能够进行图像分类。手机向应用服务器上传一张或多张图片,应用服务器在完成图像分类之后向手机返回图像分类结果。或者
用户2打开浏览器,访问用于图像识别的网页,终端向网络服务器上传一张或多张图片。网络服务器在完成图像识别之后向终端返回图像识别结果。或者
用户3通过自然语言向智能家居设备(如智能音箱)发出请求,该请求可以为播放歌曲、查询天气、或者定制提醒等。智能家居设备收集用户的语音,向服务器发送收集的语音消息,服务器进行分析之后向智能家居设备返回相应请求的内容,智能音箱可以播放用户请求的歌曲、播报天气情况、或设置用户请求的提醒。
电子终端100向服务器101发送请求,并在该请求中包括图像或语音数据。在其他一些实施例中,该请求中还可以包括请求类型。例如,对于用户发起的图 像分类处理业务,请求中可以包括表示图像分类请求类型的信息。服务器101接收到请求后,调用多核芯片102对该请求进行处理。多核芯片102完成请求的处理后将处理结果发送给服务器101。服务器101将处理结果返回给电子终端100。具体来说,服务器101可以以HTTP消息的形式将处理结果返回给电子终端100。
多核芯片向服务器返回处理结果可以包括以下几种情况:
若多核芯片完成处理,该芯片向服务器返回相应的处理结果,服务器将处理结果返回给电子终端。例如,对于一个图像分类的请求,多核芯片完成了处理,则向服务器返回处理结果,该处理结果可以为图像的具体分类,如“风景”、“运动”等。对于一个图像识别的请求,其对应的处理结果可以为图像识别的具体结果,如人脸识别是否成功,或者是人的姓名、动植物的名称等。
若多核芯片处理失败,该芯片向服务器返回处理失败消息,服务器将处理失败消息返回给电子终端。
若多核芯片无法处理请求,也无法向服务器返回消息,服务器在等待超时后向电子终端返回处理失败消息。
其中,电子设备100可以是便携式电子设备,如手机、平板电脑、膝上型计算机(Laptop)、可穿戴电子设备(如智能手表)等。在其他一些实施例中,上述电子设备100也可以是台式计算机或车载设备等设备。
图2示出了电子设备100的结构示意图。
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
本发明实施例中应用的多核芯片内部包含多个芯片处理器。如图1所示,多核芯片102包含4个芯片处理器1021-1024。多个芯片处理器可以并行运行,使该芯片能够同时处理多项任务。在一些实施例中,上述多核芯片可以应用于图像/视频处理或语音处理的神经网络芯片,也被称为AI芯片、AI加速卡,上述多核芯片也可以应用于其他系统中。本申请对此不做任何限制。
本发明实施例中应用的多核芯片主要部署在服务器中。该芯片可以插接在服 务器内部(如通过PCI或PCIe插槽等方式),也可以作为外接设备与服务器连接(如通过PCT、PCIe或USB等方式)。
本发明实施例提供了一种服务器采用单线程调用多核芯片多核芯片的方法。
在本实施例中,如图3所示,Host侧表示服务器侧,Device侧表示多核芯片侧。同一时间服务器调用多核芯片处理一个请求。在多核芯片完成该请求的处理并向服务器返回结果后,服务器调用多核芯片进行下一个请求的处理。在另一些实施例中,一个请求可以包括多个子请求。在这种情况下,服务器可以为每个子请求配置一个多核芯片中包含的芯片处理器,当服务器将请求发送至多核芯片后,芯片中的各芯片处理器能够同时处理对应的子请求。为了使芯片能够一次完成该请求,该请求包含的子请求的数量可以小于或等于芯片中包含的芯片处理器的个数。芯片可以在所有芯片处理器均完成相应子请求的处理之后,统一向服务器返回处理结果。在其他一些实施例中,服务器可以指定完成请求的芯片处理器。服务器向多核芯片发送请求时,在请求中携带指定的芯片处理器的序号。
本发明实施例提供了一种服务器采用多个线程调用多核芯片的方法。
在该实施例中,当服务器接收到新的请求后,服务器为该请求创建对应的线程。对于每个请求,服务器分配多核芯片中的一个芯片处理器处理该请求。其中,创建的线程的个数最多可以为芯片中包含的芯片处理器的个数,每个线程对应于芯片中相应的芯片处理器。举例说明,如图4所示的多核芯片中包括4个芯片处理器,那么在服务器中创建的线程个数最多为4个。因此,在该调度模式中,服务器在同一时间能够对4个请求进行处理。如果已经有4个请求同时在进行处理,服务器将等待该4个请求中的一个或多个请求处理结束后才能接收新的请求。服务器在向芯片发送请求时,可以在请求中携带芯片处理器的序号信息,同时,芯片在向服务器返回处理结果时可以在处理结果中携带芯片处理器的序号信息,以便于服务器在收到处理结果时,根据芯片处理器的序号信息返回到对应的线程进行处理。
在上述调度模式中,由于采用多线程调度,因此,服务器能够在一定程度上支持并发操作,并且较为充分的利用了多核芯片中的多个芯片处理器的处理能力,提高了多核芯片的处理效率。
本发明实施例提供了一种多核芯片,该多核芯片具有多个芯片处理器。如图5所示,在该多核芯片中,为每个芯片处理器配置了对应的控制器,该控制器用于缓存对应芯片处理器的一个或多个请求。具体来说,控制器内部可以包括缓存单元,在缓存单元中缓存对应芯片处理器的一个或多个请求。每个控制器与对应的芯片处理器连接,以便进行请求的发送。
在图4所示的系统中,一个请求从SRC模块发送至多核芯片,在多核芯片的芯片处理器中进行处理,在DST模块收到多核芯片发送的处理结果之后,下一个请求从SRC模块向多核芯片发送。这期间,处理该请求的芯片处理器在请求从SRC模块向多核芯片发送的时间以及将处理结果从芯片处理器返回至DST模块的时间内均处于空闲状态。而在如图5所示的多核芯片的系统中,由于控制器具有 缓存能力,服务器在向芯片发送请求时,无需等待芯片中的一个或多个芯片处理器处理结束之后才能发送下一个请求。芯片接收到请求后,将请求缓存在芯片处理器对应的控制器中,由芯片处理器进行处理。芯片处理器在处理完一个请求之后,能够立即从控制器中获得下一个请求进行处理。因此,本实施例提供的多核芯片能够进一步提高多核芯片的处理效率。此外,服务器接收新的请求时,无需受到多核芯片中包含的芯片处理器个数的限制,增强了并发能力。
在其他一些实施例中,控制器还用于在每次处理时,从缓存的请求中获取多个请求,将该多个请求合并为一个请求,将该合并请求发送至对应的芯片处理器进行处理。
具体来说,请求的输入可以为高维数组,例如,该高维数组可以为1*1024*256的数组。控制器进行合并时,将两个1*1024*256的请求合并为一个2*1024*256的请求。在其他一些实施例中,控制器每次进行请求合并时,可以将三个或多于三个请求合并为一个请求。
具体来说,控制器可以根据请求的数据量和/或模型匹配来确定是否将多个请求进行合并。例如,如果一个请求的数据量较大,可以不进行合并处理,相对应的,如果两个请求的数据量均较小,可以将这两个请求进行合并处理。再例如,如果三个请求使用相同的模型,处理过程相同,可以将这三个请求进行合并处理。在其他一些实施例中,控制器可以依据请求的顺序将多个请求进行合并。
通过该实施例,控制器能够将多个请求合并为一个请求一次处理完成,充分的利用了每个芯片处理器的处理能力,使得多核芯片的处理效率进一步得到提升。
图6A-6E示例性示出了本实施例中应用的多核芯片的结构。具体如下:
图6A示例性示出了本实施例中应用的多核芯片的一种结构。如图6A所示,多核芯片102包括4个芯片处理器1021-1024以及对应的4个控制器1031-1034,每个芯片处理器与其对应的控制器1031-1034相连接,如芯片处理器1021与其对应的控制器1031相连接。此外,控制器1031-1034分别与芯片102的输入端104和输出端105连接,以便从服务器101中的处理器107接收请求以及向处理器107发送处理结果。
基于如图6A所示的多核芯片102,请求在该芯片中的处理流程如图7A所示:
S201:控制器1031-1034从处理器107接收请求。
具体来说,处理器107确定了每个请求对应的芯片处理器,根据请求对应的芯片处理器的序号信息将请求发送至对应的控制器1031-1034,控制器1031-1034从处理器107接收的是分配给与该控制器1031-1034相对应的芯片处理器的请求。
S202:控制器1031-1034缓存一个或多个请求。
S203:控制器1031-1034从缓存的请求中获取请求并将请求发送给芯片处理器。
具体来说,每次处理时,控制器1031-1034可以从缓存的请求中获取一个请求发送给芯片处理器进行处理,也可以从缓存的请求中获取多个请求,将该多个 请求合并为一个请求,将该合并的请求发送至对应的芯片处理器进行处理。
S204:控制器1031-1034从芯片处理器接收处理结果,向处理器107发送该处理结果,返回至步骤S203。
若芯片处理器执行的是合并的请求,则控制器1031-1034从芯片处理器接收的处理结果中包含合并的多个请求对应的处理结果。举例说明,请求A为对图片A进行图像识别,请求B为对图片B进行图像识别,控制器将该请求A和请求B进行合并后,合并之后的请求为对图片A和图片B分别进行图像识别。芯片处理器对该合并的请求进行处理之后的处理结果为图片A的图像识别结果和图像B的图像识别结果。
在其他一些实施例中,控制器1031-1034向处理器107发送处理结果时,一并发送芯片处理器的序号信息,或者将处理结果和芯片处理器的序号信息进行封装之后发送给处理器107,以便处理器107能够根据该序号信息查找相应的线程。
图6B示例性示出了本实施例中应用的多核芯片的另一种结构。与图6A不同的是,图6B中,芯片的输出端105连接的是芯片处理器1021-1024。相对应的,基于如图6B所示的多核芯片,请求在该芯片中的处理流程中,芯片处理器在完成了请求的处理之后,直接向处理器107返回处理结果。在其他一些实施例中,芯片处理器完成了请求的处理之后,可以向其对应的控制器1031-1034发送处理结束消息,该处理结束消息用于指示其处理结束。控制器1031-1034接收到处理结束消息后,向芯片处理器发送下一个请求。
图6C示例性示出了本实施例中应用的多核芯片的另一种结构。如图6C所示,多核芯片中包含控制器106,其中,控制器106用于向各芯片处理器分配与其对应的请求。控制器106与各芯片处理器的控制器1031-1034相连接,控制器1031-1034分别与对应的芯片处理器1021-1024相连接。此外,控制器106分别与芯片的输入端104和输出端105连接,以便从处理器107接收请求以及向处理器107发送处理结果。
基于如图6C所示的多核芯片,请求在该芯片中的处理流程如图7B所示:
S301:控制器106从处理器107接收请求。其中,请求中携带芯片处理器的序号信息。在一些实施例中,控制器106从处理器107接收的可以是将请求和芯片处理器的序号信息进行封装之后的信息。
S302:控制器106根据芯片处理器的序号信息将请求发送至对应芯片处理器的控制器1031-1034。在一些实施例中,控制器106对接收的封装信息进行解封装后获得芯片处理器的序号信息。
S303:控制器1031-1034缓存一个或多个请求。
S304:控制器1031-1034缓存中的请求中获取请求并将请求发送给芯片处理器。该步骤具体可参考步骤S203,这里不再赘述。
S305:控制器1031-1034从芯片处理器接收处理结果,向控制器106发送该处理结果。在一些实施例中,控制器1031-1034从芯片处理器接收的处理结果中携带芯片处理器的序号信息,转发给控制器106。在其他一些实施例中,控制器 1031-1034从芯片处理器接收的处理结果,控制器1031-1034将处理结果以及芯片处理器的序号信息发送给控制器106。
S306:控制器106将处理结果发送给处理器107。其中,处理结果中可以携带芯片处理器的序号信息,以便服务器能够根据该序号信息查找相应的线程。在其他一些实施例中,控制器106向处理器107发送的是封装之后的处理结果和芯片处理器的序号信息。该封装可以在芯片处理器、控制器1031-1034或控制器106中完成。
图6D和图6E示例性示出了本实施例应用的多核芯片的另一种结构。与图6C不同的是,图6D中,控制器1031-1034连接该芯片102的输出端106。图6E中,芯片处理器1021-1024连接该芯片102的输出端105。相应的,在完成请求的处理后,分别由控制器1031-1034或芯片处理器向服务器返回处理结果。
在其他一些实施例中,上述控制器1031-1034可以包含在对应的芯片处理器中。
在其他一些实施例中,图6A-6E中的输入端104和输出端105可以合并为一个输入/输入端口。
本发明实施例提供了一种维护多核芯片负载的方法。其中,负载可以包括已经发送给芯片处理器进行处理而未返回结果的请求数量。在该实施例中,服务器维护数据索引表,该数据索引表用于维护多核芯片中各芯片处理器的负载情况。此外,服务器根据数据索引表中各芯片处理器的负载情况为请求分配对应的芯片处理器。数据索引表的结构如表1所示。数据索引表中包括多个数组,数组的数量对应于多核芯片中包括的芯片处理器的数量。举例来说,如图1所示的多核芯片102包含4个芯片处理器1021-1024,对应的,数据索引表中可以包括4个数组,分别对应于芯片处理器1021-1024。在一些实施例中,多核芯片包含的芯片处理器的数量可以预置在服务器中,也可以是服务器从多核芯片获得的配置信息。其中,第i个数组对应于芯片102中的芯片处理器102i,用于维护芯片处理器102i的负载情况。具体来说,第i个数组可以以芯片处理器102i的处理器序号Core i命名,该数组可以包括索引记录Index Record。其中,索引记录Index Record中保存了发送给芯片处理器102i的请求对应的索引Index,其中,服务器为每一个请求生成对应的索引。举例说明,在下表中,在Core 1对应的数组中,索引记录Index Record中保存了N条索引。在一些实施例中,索引数量即可认为是芯片处理器的负载值。索引记录Index Record的初始状态可以为空。
在其他一些实施例中,第i个数组还可以包括索引数量Num_Index。索引数量Num_Index是索引记录中保存的索引的数量。其初始值可以为0。
可以理解的是,表1仅示例性示出了数据索引表的结构,具体各个字段的名称,本发明不做任何限定。
表1
Figure PCTCN2021075196-appb-000001
服务器101在收到新的请求之后,确定处理该请求的芯片处理器。索引是在服务器101确定处理该请求的芯片处理器后生成的。也就是说,每个请求对应一条索引。索引可以包括处理器序号Core ID、线程号Thread ID和时间戳Timestamp等参数。其中,处理器序号Core ID是由服务器101确定的处理该请求的芯片处理器的ID;线程号Thread ID是操作系统为该请求所对应的线程分配的,在索引生成时可以通过std::this_thread::get_id()函数获取。服务器接收到请求后,操作系统为每个请求创建一个线程。线程号是操作系统识别线程的唯一标志。记录请求在服务器中对应的线程号Thread ID使服务器能够追踪该请求的处理情况。此外,由于线程号在系统内部可能会在不同时间被重复使用,本实施例中还在生成索引时记录了时间戳Timestamp,在索引生成时可以通过std::time_t getTimeStamp()函数获取。通过线程号Thread ID和时间戳Timestamp信息,服务器能够唯一的识别请求,而索引中的处理器序号Core ID建立了请求与对应芯片处理器的映射。
从请求发送给多核芯片进行处理到多核芯片处理结束向服务器返回处理结果的整个处理流程中,在处理消息中均携带索引,多核芯片能够基于该索引分配请求给相应的芯片处理器,服务器能够基于索引进行处理结果的匹配,有效的监控请求的处理情况。
具体来说,服务器101生成索引后,将请求与对应的索引封装为处理请求消息,将封装后的处理请求消息发送给多核芯片102,芯片102将处理请求消息进行解封装之后可以获得请求和对应的索引,从索引中获得该请求对应的芯片处理器的处理器序号Core ID,芯片102将请求发送给相应Core ID的芯片处理器,由该芯片处理器处理请求对应的业务。在其他一些实施例中,服务器101根据索引中的处理器序号Core ID将封装后的处理请求消息直接发送给芯片102中对应Core ID的芯片处理器,由该芯片处理器处理请求。
芯片102完成请求的处理之后,将处理结果和对应的索引封装为处理结果消息,发送给服务器101。服务器101接收到处理结果消息后,进行解封装,获得处理结果和对应的索引。服务器101通过索引中的信息能够匹配到对应的请求,将处理结果发送给对应的电子终端100。
此外,在本实施例中,从请求发送给多核芯片进行处理到多核芯片处理结束向服务器返回处理结果的整个处理流程中,在处理消息中均携带索引,还能够使服务器能够根据索引进行芯片的负载管理。
具体来说,服务器101生成索引后,根据索引中的处理器序号Core ID将该 索引保存在数据索引表中对应Core ID的芯片处理器的索引记录Index Record中,对应的索引数量Num_Index增加;当服务器101接收到处理结果和对应的索引,服务器根据索引中的处理器序号Core ID在对应Core ID的芯片处理器的索引记录Index Record中查找到与索引中的线程号Thread ID和时间戳Timestamp相匹配的索引,将该条索引从索引记录Index Record中移除,对应的索引数量Num_Index减少。
在其他一些实施例中,服务器101生成索引后,根据索引中的处理器序号Core ID将该索引保存在数据索引表中对应Core ID的芯片处理器的索引记录Index Record中;当服务器101接收到处理结果和对应的索引,服务器根据索引中的处理器序号Core ID在对应Core ID的芯片处理器的索引记录Index Record中查找到与索引中的线程号Thread ID和时间戳Timestamp相匹配的索引,将该条索引从索引记录Index Record中移除。
下面结合请求的处理流程说明服务器对多核芯片的调用方法以及服务器维护多核芯片负载情况的方法,如图8所示。
S401:电子终端100将请求发送给服务器101;
S402:服务器101接收请求,根据各芯片处理器的负载值Load[]来选择执行该请求的芯片处理器。其中,可以将数据索引表中该芯片处理器的索引数量Num_Index作为芯片处理器的负载值Load[],也可以计算数据索引表中该芯片处理器对应的索引记录中保存的索引数量,并将其作为芯片处理器的负载值Load[]。芯片处理器的负载值越高,表明其当前正在执行的请求较多,其处理新请求的能力越弱,反之,负载值越低,则表明其处理新请求的能力越强。
在本实施例中,根据各芯片处理器的当前负载值确定分配概率,负载值越高的芯片处理器,相应的分配概率越低。
具体来说,如图9所示,服务器根据各芯片处理器的负载值来选择执行处理请求的芯片处理器包括如下步骤:
S4021:服务器101根据数据索引表中各芯片处理器的负载值确定其新请求处理能力。负载值越高,其对应的新请求处理能力越弱。在一些实施例中,芯片处理器102i的新请求处理能力AoE[i]与负载值Load[i]的关系可以是:
Figure PCTCN2021075196-appb-000002
其中,AoE[i]表示芯片处理器102i的新请求处理能力,Load[i]表示芯片处理器102i的负载值。在一些实施例中,可以将数据索引表中芯片处理器102i的索引记录对应的索引数量认为是其负载值Load[i]。
举例说明,若多核芯片包括4个芯片处理器,每个芯片处理器的当前负载值为Load[4]={1,2,3,2},则其新请求处理能力AoE[4]={1,1/2,1/3,1/2}。在一些实施例中,若某芯片处理器负载值为0,服务器可以对多核芯片包含的每个芯片处理器的负载值进行补偿,根据补偿后的负载值获得各芯片处理器的新请求处理能力。在一些实施例中,上述补偿为将每个芯片处理器的负载值加一。在其他一 些实施例中,为了避免出现负载值为0的情况,在确定各芯片处理器的新请求处理能力之前,先对每个芯片处理器的负载值进行补偿。
考虑到各芯片处理器的处理速度的不同,在其他一些实施例中,步骤S4021还可以为:服务器101根据各芯片处理器的负载值和处理速度确定各芯片处理器的新请求处理能力。负载值越高,处理速度越低,其对应的新请求处理能力越弱,相对的,负载值越低,处理速度越快,其对应的新请求处理能力越强。在一些实施例中,芯片处理器102i的新请求处理能力AoE[i]与负载值Load[i]以及处理速度SoE[i]的关系可以是:
Figure PCTCN2021075196-appb-000003
其中,AoE[i]表示芯片处理器102i的新请求处理能力,Load[i]表示芯片处理器102i的负载值,SoE[i]表示芯片处理器102i的处理速度。
其中,多核芯片中各芯片处理器的处理速度是其属性信息,可以预先存储在服务器中,也可以是从多核芯片获取的配置信息。
举例说明,若多核芯片包括4个芯片处理器,各个芯片处理器的当前负载值为Load[4]={1,2,3,2},各个芯片处理器的处理速度为SoE[4]={1,2,1,1},则其新请求处理能力AoE[4]={1,1,1/3,1/2}。
S4022:根据各个芯片处理器的新请求处理能力AoE[i]计算各个芯片处理器的分配概率p[i],并确定各个芯片处理器的概率空间。分配概率表示新的处理请求分配到该芯片处理器的概率,新请求处理能力越强,该芯片处理器对应的分配概率越大。在一些实施例中,芯片处理器的分配概率p[i]可以是:
Figure PCTCN2021075196-appb-000004
其中,AoE[i]为芯片处理器102i的新请求处理能力,
Figure PCTCN2021075196-appb-000005
为所有芯片处理器的新请求处理能力之和,其中n为多核芯片包含的芯片处理器的个数。所有芯片处理器分配概率的总和为1。
举例说明,在S4021中得到了4个芯片处理器的处理能力AoE[4]={1,1/2,1/3,1/2},则各芯片处理器的分配概率为P[4]={3/7,3/14,1/7,3/14},那么各芯片处理器对应的概率空间分别为(0,3/7],(3/7,9/14],(9/14,11/14],(11/14,1]。
S4023:生成随机数,根据该随机数确定分配的芯片处理器。在一些实施例中,该步骤具体可以为:在(0,1]区间之内生成随机数,并确定其所处的概率空间对应的芯片处理器即为执行该请求的芯片处理器。例如,基于在步骤S4022中得到的4个芯片对应的概率空间,若生成的随机数为0.5,由于3/7<0.5<9/14,因此,该随机数位于第二个芯片处理器对应的概率空间,因此,该新请求将分配给第二个芯片进行处理。
通过上述选择方法,能够根据当前的芯片处理器的负载值来选择芯片处理器,负载越高的芯片处理器被选择到的概率越低,有利于在多核芯片包含的各芯片处 理器之间实现负载均衡,充分的利用硬件资源。
S403:服务器101生成与请求相对应的索引。具体来说,服务器101根据为请求分配的处理器序号Core ID、获取的该请求的线程号Thread ID以及时间戳Timestamp生成索引。
S404:服务器101将索引保存到数据索引表中,并将索引与请求封装成为处理请求消息,发送给多核芯片102进行处理。
其中,服务器101将索引保存到数据索引表中,具体可以包括以下步骤:
服务器101将索引保存在数据索引表中对应处理器序号Core ID的芯片处理器的索引记录中。
在其他一些实施例中,该芯片处理器的索引数量相应增加。在一些情况下,所述增加具体可以为索引记录中每增加一条索引,相应索引数量加一。
服务器101将索引与请求封装成为处理请求消息,发送给多核芯片102进行处理,具体包括以下步骤:
服务器101中的处理器107根据索引中的处理器序号Core ID,将封装后的处理请求消息发送给多核芯片102中与该处理器序号Core ID对应的芯片处理器进行处理。
在其他一些实施例中,服务器101中的处理器107将封装后的处理请求消息发送给多核芯片102,由芯片102进行解封装获得处理请求和对应的索引,从索引中获得处理器序号Core ID,将请求和索引发送给与该处理器序号Core ID对应的芯片处理器进行处理。
在其他一些实施例中,服务器101将请求和对应的索引进行封装并发送至多核芯片102之前,服务器101创建与该索引相对应的条件等待变量Condition,并将该条件等待变量保存在数据索引表中对应的芯片处理器索引记录中与该索引对应的条目下。然后服务器使用条件等待变量Condition进行等待。
条件等待变量,也称为条件变量,是一个附带条件的变量。其值通常为“True”或“False”。当条件由不满足转变为满足时,其值发生变化。例如,条件不满足时,条件等待变量的值为“False”,当条件满足时,其值转变为“True”。条件等待变量一般用于管理线程。举例说明,线程A使用条件等待变量等待某个条件并挂起,直到条件满足,通知条件等待变量,然后线程A被唤醒。
当服务器101创建条件等待变量Condition后,设置条件为收到处理结果。当服务器101将处理请求消息发送给多核芯片102后,条件等待变量Condition为False,服务器中该请求的线程被挂起,等待从多核芯片102返回的处理结果。
服务器101从多核芯片102接收处理结果消息后,从处理结果消息中获取索引,根据索引从数据索引表中查询获得对应的条件等待变量,将条件等待变量Condition的值置为True,服务器101将该请求的线程唤醒,服务器101向电子终端100返回处理结果。
服务器101还可以为条件等待变量Condition设置一个时间条件。即,若经过预定时间没有收到处理结果,即为等待超时,条件等待变量Condition的值置 为True,服务器101将该请求的线程唤醒,服务器101向电子终端100返回处理失败消息。
S405:服务器101从多核芯片102接收处理结果消息。其中,处理结果消息可以为封装后的处理结果和索引。该索引是服务器101为该处理结果对应的请求生成并发送给多核芯片102的,芯片102在执行完毕之后,将处理结果和对应的索引进行封装并返回给服务器101。
S406:根据处理结果消息中的索引维护数据索引表。
具体来说,当请求处理结束,服务器101从多核芯片102接收到的处理结果消息中包含索引,通过索引在数据索引表中对应处理器序号Core ID的芯片处理器的索引记录中查找对应的索引,将该索引从索引记录中移除。
在其他一些实施例中,该芯片处理器的索引数量相应减少。在一些实施例中,所述减少具体可以为:索引记录Index Record中每移除一条索引,相应的索引数量Num_Index减一。
S407:服务器101将处理结果发送给电子设备100。
具体来说,在步骤S405中服务器101从多核芯片102接收处理结果消息后,服务器101对处理结果消息进行解封装获得处理结果和对应的索引。服务器101根据索引在数据索引表中查询获得匹配的索引及其对应的条件等待变量。此时,服务器101将条件等待变量的值由“False”置为“True”,条件等待变量值的变化使得之前挂起的请求线程被唤醒,服务器101向电子设备100返回处理结果。
在其他一些实施例中,服务器101在收到处理结果消息后,将其中的处理结果填入数据索引表中与该索引对应的条目中,再向电子设备100返回处理结果。
若芯片处理器发生异常或损坏,不能处理请求并且无法向服务器101返回处理结果,那么条件等待变量将在等待超时后,将值由“False”置为“True”,服务器101将该请求的线程唤醒,服务器101向电子设备100返回处理失败信息。
在其他一些实施例中,如图8所示,服务器101可以包括前端接口模块和负载维护模块。其中,在前端接口模块用于完成以下方法步骤:
S501:从电子终端100接收请求。
S502:根据各芯片处理器的负载值Load[]来选择执行该请求的芯片处理器。该步骤可参考步骤S402,这里不再赘述。
S503:生成与请求相对应的索引。该步骤可参考步骤S403,这里不再赘述。
S504:将索引保存到数据索引表中,并将索引与请求封装成为处理请求消息,发送给多核芯片102。
S505:等待负载维护模块发送的处理结果更新消息,或者等待超时。
具体来说,可以使用条件等待变量进行等待,具体参考步骤S404,这里不再赘述。在一些实施例中,负载维护模块在接收到处理结果消息并获得处理结果和索引后,可以将处理结果和索引发送给前端接口模块。
S506:向电子终端100返回处理结果。
具体来说,若前端接口模块从负载维护模块获得处理结果消息,则前端接口 模块向电子终端100返回处理结果消息中的处理结果。若请求对应的条件等待变量等待超时,则前端接口模块向电子终端100返回处理失败消息。
负载维护模块用于维护数据索引表,并完成以下方法步骤:
S601:从前端接口模块接收索引,将索引保存在数据索引表中对应的索引记录中。
具体来说,负载维护模块将索引保存在数据索引表中对应处理器序号CoreID的芯片处理器的索引记录中。在其他一些实施例中,该芯片处理器的索引数量相应增加。在一些情况下,所述增加具体可以为索引记录中每增加一条索引,相应索引数量加一。
S602:从多核芯片102接收处理结果消息,根据处理结果消息中的索引,在数据索引表中查询,更新数据索引表。该步骤可参考步骤S406,这里不再赘述。在其他一些实施例中,负载维护模块将接收的处理结果消息中的处理结果保存到索引对应的条目中,然后执行步骤S603。
S603:通知前端接口模块处理完毕。
具体来说,负载维护模块可以将接收到的处理结果消息中的索引和处理结果发送给前端接口模块,也可以将保存在数据索引表中的索引和处理结果发送给前端接口模块,以指示对应的请求处理完毕。
通过上述方法步骤,服务器101在接收到新的请求时为该请求创建索引,并且在整个处理流程中都包括该索引,使得服务器101能够通过索引指示分配的芯片处理器,多核芯片能够根据索引将请求分配到对应的芯片处理器,并且服务器101能够通过索引来进行多核芯片102中各芯片处理器负载的维护,进一步的用于芯片处理器的选择,提高了多核芯片的处理效率。
如果多核芯片中的一个或多个芯片处理器发生某种异常或损坏,不能处理请求并且无法向服务器返回处理结果,那么一段时间后,在服务器中,该异常芯片处理器对应的数据索引表中的索引记录中将会存在大量未返回结果的请求对应的索引,索引数量较高,表示该芯片处理器的负载值较高。由步骤S4021-S4023可知,负载值高的芯片处理器,其分配概率降低。可见,该选择方法能够根据芯片处理器的负载来进行请求的分配,避免了将大量请求分配至异常的芯片处理器。
本发明实施例提供了一种负载维护方法,用于服务器101维护数据索引表。该方法包括的主要步骤如图10所示:
S701:服务器101在第一时长T1内检测多核芯片102中包含的各个芯片处理器是否有处理结果消息返回,将在第一时长T1内没有处理结果消息返回的芯片处理器确定为异常芯片处理器。具体包括以下步骤:服务器101从多核芯片102接收到处理结果消息之后,通过解封装获得处理结果和对应的索引信息,检测在第一时长T1内收到的处理结果消息中包含的索引中的处理器序号Core ID,从而确定没有处理结果消息返回的芯片处理器。其中,该第一时长T1可以被设置为远大于一个业务处理周期(通常为毫秒级或秒级),具体来说,该第一时长T1可以设置为30秒。
若服务器101在第一时长T1内检测到某芯片处理器有处理结果消息返回,则表明该芯片处理器处于正常状态,服务器返回步骤S701继续检测。若服务器101在第一时长T1内没有从某芯片处理器接收到处理结果,则表明该芯片处理器有可能发生了异常,并且无法向服务器101返回处理结果,那么服务器101将执行步骤S703;
S703:服务器101触发流量补偿,以确定异常芯片处理器的状态。
在步骤S701检测的第一时长T1内,虽然服务器101没有从异常芯片处理器接收到处理结果消息,但服务器101仍在该时间段内根据其维护的数据索引表向多核芯片102的各芯片处理器分配请求。对于该异常芯片处理器,由于没有处理结果返回,数据索引表中的索引记录中会累积较多数量的索引,同时索引数量较高。由步骤S4021-S4023可知,该异常芯片处理器的分配概率将逐渐变低。当分配概率过低时,服务器101无法向该异常芯片处理器分配请求。此时,即便该异常芯片处理器恢复正常,也可能由于分配概率过低,无法收到请求。
具体来说,服务器101可以生成用于测试该异常芯片处理器状态的测试请求,并指定由该异常芯片处理器处理,以确认异常芯片处理器的状态。服务器101在生成与该测试请求对应的索引时,可以指定其中的处理器序号Core ID为该异常芯片处理器,使得该测试请求分配至异常芯片处理器。
具体来说,该测试请求可以是服务器101中预置的处理任务,如对预存在服务器中的一幅图片的图像识别请求,也可以是从服务器101当前接收到的请求中复制过来的处理任务。
服务器101获得该测试请求的线程号和时间戳,并结合指定的异常芯片处理器的处理器序号Core ID生成索引,将测试请求和索引进行封装,发送给多核芯片102进行处理。此外,服务器101将该索引保存在数据索引表中对应的芯片处理器的索引记录中,索引数量相应增加。
在其他一些实施例中,服务器101在执行步骤S703之前,还可以执行步骤S702:
S702:确定该异常芯片处理器在第二时长T2内没有收到请求,并且在确定该异常芯片处理器在第一时长T1内没有收到处理结果以及在第二时长T2内没有收到请求时,执行步骤S703,若该异常芯片处理器在第二时长T2内收到请求,则执行步骤S704。具体来说,服务器101可以通过检测数据索引表中记录的索引中的处理器序号和时间戳来确定该异常芯片处理器在第二时长T2内是否收到请求。若该异常芯片处理器在第二时长T2内收到过请求,则服务器101可以通过检测该请求的处理结果来确定该异常芯片处理器的状态。其中,第二时长T2可以设置为接近一个业务处理周期,如1秒。在其他一些实施例中,第二时长对应的时间段可以是第一时长对应的时间段中的一部分,例如,服务器在30s的第一时长的最后1s的时长内检测该异常芯片处理器是否收到过请求。
S704:服务器101在第三时长T3内检测是否从异常芯片处理器接收到处理结果。具体来说,服务器101可以通过检测接收的处理结果消息中包含的索引中 的处理器序号Core ID检测是否从异常芯片处理器接收到处理结果。该第三时长T3可以从步骤S703中服务器101进行流量补偿,为测试请求生成索引时获得的时间戳开始计时,也可以从步骤S702中确定在第二时长T2内收到请求后开始计时。在一些实施例中,所述第三时长T3可以设置为远大于一个业务周期,例如30秒。若服务器101在第三时长T3内从该异常芯片处理器接收到处理结果,表明该异常芯片处理器的状态已经恢复正常,可以正常处理新请求。此时,数据索引表中累积的索引数量会使得该异常芯片处理器的分配概率低,影响该芯片处理器的处理效率。为了能够使该芯片处理器能够正常的接收请求,在本实施例中,服务器101可以执行步骤S705。
若服务器101在第三时长T3内没有从该异常芯片处理器接收到处理结果,表明该异常芯片处理器的状态仍处在异常状态。此时,服务器101将返回至步骤S701进行监测。服务器101保持数据索引表中该异常芯片处理器的高负载和低分配概率状态,使得新接收的请求保持低概率分配到异常芯片处理器。
在其他一些实施例中,若服务器101在第三时长T3内没有从该异常芯片处理器接收到处理结果,表明该异常芯片处理器的状态仍处在异常状态,服务器101跳转到S703,确定异常芯片处理器的状态。
步骤S705:将数据索引表中该芯片处理器的索引记录中累积的索引移除。服务器101将返回步骤S701进行监测。
具体来说,服务器101在第三时长T3内从异常芯片处理器接收到处理结果,表明该异常芯片处理器已经恢复正常。但此时服务器101中的数据索引表中,该芯片处理器中还保存着大量在其状态异常期间未处理或处理失败的索引,同时,索引数量较高。由步骤S4021-S4023可知,索引数量较高的芯片处理器对应的分配概率较低。因此,将数据索引表中该芯片处理器的索引记录中累积的索引移除,能够使该芯片处理器获得请求的概率提高,提高了多核芯片的处理效率。
在其他一些实施例中,服务器101将索引数量清零。
通过上述步骤,服务器101能够获得芯片处理器状态的变化,并能够根据状态的变化对于数据索引表中的索引记录和索引数量做出适应性的处理,使得多核芯片102的处理效率得到提高。
本发明实施例提供了另一种负载的维护方法。在本实施例中,服务器101中维护的数据索引表的数据结构可以如表2所示。
表2
Figure PCTCN2021075196-appb-000006
数据索引表中包括多个数组,数组的数量对应于多核芯片中包括的芯片处理器的数量。举例来说,对于如图1所示的多核芯片,数据索引表中可以包括4个数组,分别对应于芯片处理器1021-1024。其中,第i个数组对应于芯片中的第i个芯片处理器,用于维护该芯片处理器的负载情况。具体来说,第i个数组以第i个芯片处理器的处理器序号Core i命名,包括两个数据单元,即新数据单元New[]和旧数据单元Old[],新数据单元New[]和旧数据单元Old[]具有相同的数据结构,该数据结构可以包括该芯片处理器对应的索引记录Index Record。索引记录Index Record中存储对应芯片处理器处理的处理请求相关的索引。其中,服务器101为新接收的请求生成的索引保存在新数据单元New[]的索引记录Index Record中。经过一定时长(例如:1分钟),新数据单元New[]和旧数据单元Old[]交换数据,即新数据单元New[]的索引记录Index Record与旧数据单元Old[]的索引记录Index Record互换。索引记录Index的初始状态可以为空。
在其他一些实施例中,新数据单元New[]和旧数据单元Old[]的数据结构中还可以包括索引数量Num_Index。索引数量Num_Index是对应索引记录IndexRecord中保存的索引数量。其初始值可以为0。
举例说明,在上表中,在Core 1对应的数组中,新数据单元的索引记录Index Record中保存了N条索引,其索引数量Num_Index即为N,旧数据单元的索引记录Index Record中保存了M条索引,其索引数量Num_Index即为M。在一些实施例中,可以将一个芯片处理器中新数据单元和旧数据单元的索引数量之和认为是该芯片处理器的负载值。可以理解的是,表2仅示例性示出了数据索引表的结构,具体各个字段的名称以及存储方式,本发明不做任何限定。
在一些实施例中,服务器101可以记录新数据单元和旧数据单元进行数据交换的时间T,并比较索引中的时间戳与该时间T,若索引的时间戳大于该时间T,则将该索引保存在对应芯片处理器的新数据单元。
在一些实施例中,所有芯片处理器的新数据单元和旧数据单元在同一时间进行数据交换。在另一些实施例中,各芯片处理器的新数据单元和旧数据单元可以 在不同的时间进行数据交换,服务器可以记录各芯片处理器新数据单元和旧数据单元进行数据交换的时间。在其他一些实施例中,各芯片处理器进行数据交换的周期可以不同。
下面结合请求的处理流程说明服务器101对多核芯片102的调用方法以及服务器101维护多核芯片102负载情况的方法。该方法基于如表2所示的数据索引表。
S801:电子终端100将请求发送给服务器101;
S802:服务器101接收请求,根据各芯片处理器的负载值Load[i]来选择执行该请求的芯片处理器。其中,芯片处理器的负载值Load[i]为数据索引表中该芯片处理器的新数据单元New[]的索引数量和旧数据单元Old[]的索引数量之和。
服务器101根据各芯片处理器的负载值Load[i]来选择处理该请求的芯片处理器的具体步骤可参考步骤S4021-S4023,这里不再赘述。
S803:服务器101根据为请求分配的处理器序号Core ID、获取的该请求的线程号以及时间戳生成索引。
S804:服务器101将索引保存到数据索引表中,并将索引与请求封装为处理请求消息,将处理请求消息发送给多核芯片102进行处理。
其中,服务器101将索引保存到数据索引表中,具体包括以下步骤:
服务器101将索引保存在所述数据索引表中与处理器序号Core ID对应的芯片处理器的新数据单元New[]的索引记录Index Record中。在其他一些实施例中新数据单元New[]的索引数量Num_Index相应增加。在一些实施例中,所述增加具体可以为索引记录Index Record中每增加一条索引,相应索引数量Num_Index加一。
服务器101使用条件等待变量进行请求线程的管理过程可参照步骤S404,这里不再赘述。
S805:服务器101从多核芯片102接收处理结果消息,处理结果消息包含封装后的处理结果和索引。
S806:服务器101根据处理结果消息中的索引维护数据索引表。
具体来说,当请求处理结束,服务器101从多核芯片102接收到的处理结果消息中包含索引,通过索引在数据索引表中对应处理器序号Core ID的芯片处理器的新数据单元New[]和旧数据单元Old[]的索引记录Index Record中查找对应的索引,将该索引从索引记录Index Record中移除。在其他一些实施例中对应数据单元的索引数量Num_Index相应减少。在一些情况下,所述减少具体可以为:索引记录Index Record中每移除一条索引,相应数据单元的索引数量Num_Index减一。
S807:服务器101将处理结果发送给电子设备100。
其中,服务器101根据处理结果使用条件等待变量进行线程管理的过程可参照步骤S407,这里不再赘述。
通过上述方法步骤,服务器在接收到新的请求时为该请求创建索引,并且在 整个处理流程中都包括该索引,使得服务器能够通过索引指示分配的芯片,并且服务器能够通过索引来进行芯片负载的维护,以便均衡负载。
本发明实施例提供了另一种负载维护方法,该负载维护方法基于表2中提供的数据索引表,用于在出现异常芯片处理器时进行负载的维护,如图11所示,该方法具体包括如下步骤:
S901:服务器101在第四时长T4内检测多核芯片102中包含的各个芯片处理器是否有处理结果消息返回,将在第四时长T4内没有处理结果消息返回的芯片处理器确定异常芯片处理器。该步骤的具体操作可参考步骤S701,这里不再赘述。在一些实施例中,该第四时长T4可以是从数据索引表中芯片处理器的新数据单元和旧数据单元交换数据后开始计时。第四时长T4可以被设置为远大于一个业务处理周期(通常为毫秒级或秒级),具体来说,可以设置为30秒。
若服务器101在第四时长T4内检测到某芯片处理器有处理结果消息返回,则表明该芯片处理器处于正常状态。服务器可以执行步骤S905;在其他一些实施例中,若服务器101在第四时长T4内检测到某芯片处理器有处理结果消息返回,可以返回步骤S901进行下一次的检测。
若服务器101在第四时长T4内没有从某芯片处理器接收到处理结果,则表明该芯片处理器有可能发生了异常,并且无法向服务器101返回处理结果,那么服务器101将执行步骤S903;
S903:服务器101触发流量补偿,以确定异常芯片处理器的状态。然后执行步骤S904。该步骤的具体操作可参考步骤S703,这里不再赘述。
其中,服务器101将为测试请求生成的索引保存在数据索引表中对应的芯片处理器的新数据单元的索引记录中,该新数据单元的索引数量可以相应增加。在其他一些实施例中,服务器101在执行步骤S903之前,还包括步骤S902:
S902:确定该异常芯片处理器在第五时长T5内没有收到请求,并且在确定该异常芯片处理器在第四时长T4内没有收到处理结果以及在第五时长T5内没有收到请求时,服务器101执行步骤S903。当异常芯片处理器在第五时长T5内收到请求时,服务器101可以执行步骤S904。该步骤的具体操作可参考步骤S702,这里不再赘述。
S904:服务器101在第六时长T6内检测是否从异常芯片处理器接收到处理结果。该步骤的具体操作可参考步骤S704,这里不再赘述。若服务器101在第六时长T6内从该异常芯片处理器接收到处理结果,表明该异常芯片处理器的状态已经恢复正常,可以正常处理请求。服务器101可以执行步骤S905;若服务器101在第六时长T6内没有从该异常芯片处理器接收到处理结果,表明该异常芯片处理器的状态仍处在异常状态。服务器101执行可以步骤S906。
上述第六时长T6可以设置为远大于一个业务周期的值,如30秒,也可以设置为从步骤S903中服务器为测试请求进行索引时获得的时间戳开始计时至下次新数据单元和旧数据单元翻转的时长。
S905:在下一次该芯片处理器新数据单元和旧数据单元交换数据之前,服务 器101可以将数据索引表中该芯片处理器旧数据单元的索引记录中累积的索引移除,对应的将旧数据单元的索引数量清零。服务器101执行步骤S907。
具体来说,若步骤S901中服务器101在第四时长T4内检测到某芯片处理器有处理结果返回,表明芯片处理器处于正常工作状态下,在新数据单元和旧数据单元交换数据时,由于交换数据的时间周期远大于业务处理周期,因此,保存在旧数据单元的索引记录中的索引对应的请求应该被全部处理完毕。在正常情况下,旧数据单元的索引记录应该为空,相应的索引数量应该为0。但是,由于芯片处理器在处理请求的过程中可能会出现偶发或突发的异常,而在该异常之后即恢复正常。那么发生异常的过程中可能会有少量请求没有返回处理结果,造成在旧数据单元中的索引记录中保留少量的索引,同时,索引数量也不为0。根据步骤S4021-S4023可知,数据索引表中的索引数量将影响芯片处理器的分配概率,因此,在芯片处理器状态正常的情况下,将旧数据单元的索引记录清空,能够提高芯片处理器接收请求的概率,有利于充分利用多核芯片的处理能力,提高处理效率。
在其他一些实施例中,服务器101可以先判断旧数据单元的索引数量是否为0和/或索引记录是否为空,若上述索引数量不为0和/或索引记录不为空,在下一次该芯片处理器新数据单元和旧数据单元交换数据之前,服务器101将旧数据单元的索引记录清空。
在其他一些实施例中,服务器101可以将旧数据单元的索引数量清零。
若步骤S904中服务器101在所述第六时长T6内收到处理结果,则表示该异常芯片处理器已经从异常状态恢复正常,需要将已经超时的请求对应的索引移除,移除数据索引表中该芯片处理器旧数据单元的索引记录中保存的所有索引记录,使该芯片处理器的负载值降低,使得该芯片处理器接收新请求的概率提高。同时,在交换后的旧数据单元保存原新数据单元的索引,使得服务器101在接收到相关的处理结果和索引时,能够在数据索引表中进行相应的更新。
S906:在下一次该芯片处理器新数据单元和旧数据单元交换数据之前,服务器101将数据索引表中该芯片处理器旧数据单元的索引保存在新数据单元,然后将旧数据单元的索引记录清空。服务器101执行步骤S907。在其他一些实施例中,服务器101可以将新数据单元的索引数量与旧数据单元的索引数量相加作为新数据单元的索引数量,然后将旧数据单元的索引数量清零。
具体来说,若步骤S904中服务器101在第六时长T6内没有收到处理结果,则表示该芯片处理器仍处在异常状态,将该异常芯片处理器的新数据单元的索引和索引数量值增加到新数据单元,使得在新数据单元和旧数据单元交换数据时,能够使得交换后的数据索引表中该芯片处理器继续保持高负载状态,新请求分配到该异常芯片处理器的概率保持较低的状态。同时,在新数据单元和旧数据单元交换数据之前将所有的数据保存在旧数据单元,如果在下次数据交换之前,该芯片处理器状态恢复正常,根据上述步骤,旧数据单元的数据将会被移除。
S907:将数据索引表中该芯片处理器新数据单元和旧数据单元的索引记录和 索引数量进行交换。服务器101可以返回步骤S901进行监测。
在其他一些实施例中,若服务器101在第六时长T3内没有从该异常芯片处理器接收到处理结果,表明该异常芯片处理器的状态仍处在异常状态,服务器101可以在新数据单元和旧数据单元交换数据之后,跳转到步骤S903,以确定该异常芯片处理器的状态。
通过上述步骤,服务器能够获得芯片处理器状态的变化,并能够根据状态的变化对于数据索引表中维护的数据做出适应性的处理,并能够更加准确的处理数据单元中的索引记录和索引数量,使得多核芯片的处理效率得到提高。
结合图6A所示的多核芯片以及本发明实施例提供的负载维护方法,本发明实施例还提供了一种任务的处理方法。如图12所示,该方法主要包括以下步骤:
S1001:控制器1031-1034从服务器101接收处理请求消息,该处理请求消息包括请求和对应的索引。
具体来说,控制器1031-1034从服务器101接收的是分配给与该控制器1031-1034相对应的芯片处理器的处理请求消息。
S1002:控制器1031-1034对处理请求消息进行解封装,获得请求和对应的索引。
S1003:控制器1031-1034缓存一个或多个请求及对应的索引;
S1004:控制器1031-1034从缓存的请求中获取请求并将请求发送给芯片处理器。该步骤的具体操作可参考步骤S203,这里不再赘述。
在其他一些实施例中,在进行请求的合并时,控制器1031-1034从缓存的请求中获取一个或多个请求及对应的索引,将请求或合并的请求以及对应的索引发送至对应的芯片处理器。
S1005:控制器1031-1034从芯片处理器接收处理结果,将处理结果和对应的索引封装得到处理结果消息,并向服务器101发送该处理结果消息。
若芯片处理器执行的是合并的请求,则控制器1031-1034从芯片处理器接收的处理结果中包含合并的多个请求对应的处理结果。控制器1031-1034可以将得到的多个处理结果按顺序分别与对应的索引分别进行封装得到多个处理结果消息。
在其他一些实施例中,控制器1031-1034可以从芯片处理器接收封装后的处理结果消息,并向服务器101发送该处理结果消息。
结合图6C所示的多核芯片以及本发明实施例提供的负载维护方法,本发明实施例还提供了另一种任务的处理方法。如图13所示,该方法主要包括以下步骤:
S1101:控制器106从服务器101接收封装的处理请求消息,该处理请求消息包括请求和对应的索引。
S1102:控制器106对处理请求消息进行解封装,获得请求和对应的索引。
S1103:控制器106从索引中获得处理器序号Core ID,将请求和对应的索引发送给与对应的芯片处理器相连接的控制器1031-1034。
S1104:控制器1031-1034缓存一个或多个请求及对应的索引;
S1105:控制器1031-1034从缓存中获取请求并将请求发送给芯片处理器。该步骤的具体操作可参考步骤S1004,这里不再赘述。
S1106:控制器1031-1034从芯片处理器接收处理结果,将处理结果和对应的索引封装得到处理结果消息,将处理结果消息发送给控制器106。
若芯片处理器执行的是合并的请求,则控制器1031-1034从芯片处理器接收的处理结果中包含合并的多个请求对应的处理结果。控制器1031-1034可以将得到的多个处理结果按顺序分别与对应的索引进行封装得到多个处理结果消息。
在其他一些实施例中,控制器1031-1034从芯片处理器接收封装后的处理结果消息,并向控制器106发送该处理结果消息。
S1107:控制器106向服务器101发送该处理结果消息。
在一些实施例中,控制器1031-1034可以将封装后的处理结果消息直接向服务器101发送。在其他一些实施例中,控制器1031-1034也可以将处理结果和对应的索引发送给控制器106,由控制器106封装得到处理结果消息,并向服务器101发送该处理结果消息。
通过上述方式,处理请求消息向对应芯片处理器的分配在多核芯片内部完成,简化了多核芯片与服务器之间的接口结构。
本发明的各实施方式可以任意进行组合,以实现不同的技术效果。
在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或 数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (14)

  1. 一种装置,其特征在于:其包含,
    第一处理器和第二处理器;
    第一控制器和第二控制器,所述第一控制器与所述第一处理器相连接,所述第二控制器与所述第二处理器相连接;
    所述第一控制器用于:
    存储第一请求和第二请求,所述第一请求和所述第二请求是分配给所述第一处理器的请求;
    将所述第一请求发送给所述第一处理器进行处理;
    从所述第一处理器接收第一处理结果;
    将所述第二请求发送给所述第一处理器进行处理;
    所述第二控制器用于:
    存储第三请求和第四请求,所述第三请求和所述第四请求是分配给所述第二处理器的请求;
    将所述第三请求发送给所述第二处理器进行处理;
    从所述第二处理器接收第三处理结果;
    将所述第四请求发送给所述第二处理器进行处理。
  2. 如权利要求1所述的装置,其特征在于,所述第一控制器存储第五请求和第六请求,所述第一控制器还用于,将所述第五请求和所述第六请求合并为第七请求,将所述第七请求发送给所述第一处理器进行处理。
  3. 如权利要求1所述的装置,其特征在于,所述第一控制器从第一装置接收第一消息,所述第一消息中包含所述第一请求以及第一信息;所述第一信息与所述第一处理器相对应。
  4. 如权利要求3所述的装置,其特征在于,所述第一控制器向所述第一装置发送第二消息,所述第二消息中包含所述第一处理结果和所述第一信息。
  5. 如权利要求1所述的装置,其特征在于,还包含,
    第三控制器,所述第三控制器分别与所述第一控制器和所述第二控制器相连接;
    所述第三控制器用于从第一装置接收第三消息,所述第三消息中包含所述第一请求以及第一信息;所述第一信息与所述第一处理器相对应;
    所述第三控制器根据所述第一信息将所述第一请求和所述第一信息发送给所述第一控制器。
  6. 如权利要求5所述的装置,其特征在于,所述第三控制器还用于从所述第一控制器接收所述第一处理结果和所述第一信息,
    所述第三控制器向所述第一装置发送第四消息,所述第四消息中包含所述第一处理结果和所述第一信息。
  7. 如权利要求3-6中任一项所述的装置,所述第一信息中包含第一处理器序号、时间戳和线程号;其中,所述第一处理器序号是所述第一装置为所述第一 请求分配的,所述时间戳和所述线程号与所述第一请求相对应。
  8. 一种业务处理系统,其特征在于,包括,
    第一装置;
    第二装置,所述第二装置包含第一处理器和第二处理器;
    所述第一装置向所述第二装置发送第一消息,所述第一消息中包含第一请求和第一信息,其中所述第一信息与所述第一处理器相对应;
    所述第二装置根据所述第一信息,将所述第一请求发送给所述第一处理器进行处理;
    所述第二装置向所述第一装置发送第二消息,所述第二消息中包含第一处理结果和所述第一信息,其中,所述第一处理结果是所述第一处理器对所述第一请求处理之后的处理结果;
    其中,所述第一信息是所述第一装置接收所述第一请求并确定处理所述第一请求的处理器后生成的;
    所述第一信息中包含第一处理器序号、时间戳和线程号;其中,所述第一处理器序号是所述第一装置为所述第一请求分配的,所述时间戳和所述线程号与所述第一请求相对应。
  9. 如权利要求8所述的系统,其特征在于,还包括:
    所述第一装置存储第一表格,所述第一表格与所述第二装置相对应,其中,所述第一表格包括第一数组与第二数组,所述第一数组对应于所述第一处理器,所述第二数据对应于所述第二处理器;
    所述第一数组包括第一数据和第二数据,所述第一数据包括第一记录,所述第一记录用于保存一条或多条信息,所述第一记录中保存的信息的数量是第一数量;所述第二数据包括第二记录,所述第二记录用于保存一条或多条信息,所述第二记录中保存的信息的数量是第二数量;
    所述第二数组包括第三数据和第四数据,所述第三数据包括第三记录,所述第三记录用于保存一条或多条信息,所述第三记录中保存的信息的数量是第三数量;所述第四数据包括第四记录,所述第四记录用于保存一条或多条信息,所述第四记录中保存的信息的数量是第四数量;
    其中,所述第一装置为所述第一请求生成所述第一信息后,将所述第一信息保存在所述第一记录中;
    所述第一数据和所述第二数据每隔一定时长相互交换数据;
    所述第一装置从所述第二装置接收所述第二消息后,根据所述第二消息中的所述第一信息,在所述第一记录和/或所述第二记录中查询;
    若在所述第一记录中查询到所述第一信息,从所述第一记录中移除所述第一信息;
    若在所述第二记录中查询到所述第一信息,从所述第二记录中移除所述第一信息。
  10. 如权利要求9所述的系统,其特征在于,若所述第一装置检测到在第一时长内没有从所述第一处理器接收到处理结果,
    所述第一装置生成第一测试请求以及与所述第一测试请求对应的信息,所述第一测试请求对应的信息还与所述第一处理器相对应;
    所述第一装置检测在第三时长内是否从所述第一处理器接收到处理结果,若收到处理结果,在下次所述第一数据和所述第二数据交换数据之前,将所述第二数据的所述第二记录清空;
    若没收到处理结果,在下次所述第一数据和所述第二数据数据交换之前,将所述第二数据的所述第二记录中的一条或多条信息保存到所述第一数据的所述第一记录中,将第二记录清空。
  11. 如权利要求10所述的系统,其特征在于,所述第一装置在生成所述第一测试请求之前,还包括,所述第一装置检测所述第一处理器在第二时长内是否收到请求,若所述第一装置确定所述第一处理器在所述第二时长内没有收到请求,生成所述第一测试请求。
  12. 如权利要求10所述的系统,其特征在于,若所述第一装置确定在所述第三时长内没有从所述第一处理器接收到处理结果,在所述第一数据和所述第二数据交换数据后,所述第一装置生成第二测试请求。
  13. 如权利要求9所述的系统,其特征在于,还包括,
    所述第一装置根据所述第一表格,确定处理所述第一请求的处理器,具体包括:
    根据所述第一数量和所述第二数量之和以及所述第三数量和所述第四数量之和确定第一能力和第二能力,所述第一能力表示所述第一处理器处理新请求的能力,所述第二能力表示所述第二处理器处理新请求的能力;
    根据所述第一能力和所述第二能力确定第一分配概率和第二分配概率,确定第一概率空间和第二概率空间;所述第一分配概率和所述第一概率空间对应于所述第一处理器,所述第二分配概率和所述第二概率空间对应于所述第二处理器;
    取随机数,根据所述随机数确定分配的处理器。
  14. 如权利要求13所述的系统,其特征在于,所述第一装置根据所述第一数量和所述第二数量之和、所述第三数量和所述第四数量之和以及所述第一处理器和所述第二处理器的处理速度确定所述第一能力和所述第二能力。
PCT/CN2021/075196 2020-02-21 2021-02-04 一种多核芯片及其调度方法 WO2021164560A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010108531.8A CN111415291B (zh) 2020-02-21 2020-02-21 一种多核芯片及其调度方法
CN202010108531.8 2020-02-21

Publications (1)

Publication Number Publication Date
WO2021164560A1 true WO2021164560A1 (zh) 2021-08-26

Family

ID=71494207

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/075196 WO2021164560A1 (zh) 2020-02-21 2021-02-04 一种多核芯片及其调度方法

Country Status (2)

Country Link
CN (1) CN111415291B (zh)
WO (1) WO2021164560A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111415291B (zh) * 2020-02-21 2021-09-21 华为技术有限公司 一种多核芯片及其调度方法
CN113835866B (zh) * 2021-10-09 2024-02-20 南方电网数字电网研究院有限公司 多线程任务调度优化方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118478A (zh) * 2007-09-10 2008-02-06 杭州华三通信技术有限公司 缓存管理系统
CN107102966A (zh) * 2016-02-22 2017-08-29 龙芯中科技术有限公司 多核处理器芯片、中断控制方法及控制器
US20190102301A1 (en) * 2017-09-29 2019-04-04 Intel Corporation Technologies for enforcing coherence ordering in consumer polling interactions
CN109840216A (zh) * 2017-11-28 2019-06-04 华为技术有限公司 针对高速缓存的数据处理方法及相关元件、设备、系统
US20200050376A1 (en) * 2016-06-24 2020-02-13 Futurewei Technologies, Inc. System and Method for Shared Memory Ownership Using Context
CN111415291A (zh) * 2020-02-21 2020-07-14 华为技术有限公司 一种多核芯片及其调度方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102340545A (zh) * 2011-10-31 2012-02-01 深圳市五巨科技有限公司 服务器及其数据处理方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118478A (zh) * 2007-09-10 2008-02-06 杭州华三通信技术有限公司 缓存管理系统
CN107102966A (zh) * 2016-02-22 2017-08-29 龙芯中科技术有限公司 多核处理器芯片、中断控制方法及控制器
US20200050376A1 (en) * 2016-06-24 2020-02-13 Futurewei Technologies, Inc. System and Method for Shared Memory Ownership Using Context
US20190102301A1 (en) * 2017-09-29 2019-04-04 Intel Corporation Technologies for enforcing coherence ordering in consumer polling interactions
CN109840216A (zh) * 2017-11-28 2019-06-04 华为技术有限公司 针对高速缓存的数据处理方法及相关元件、设备、系统
CN111415291A (zh) * 2020-02-21 2020-07-14 华为技术有限公司 一种多核芯片及其调度方法

Also Published As

Publication number Publication date
CN111415291A (zh) 2020-07-14
CN111415291B (zh) 2021-09-21

Similar Documents

Publication Publication Date Title
WO2022262530A1 (zh) 内存管理的方法及电子设备
WO2021164560A1 (zh) 一种多核芯片及其调度方法
CN114020470B (zh) 资源分配方法、装置、可读介质及电子设备
US10331499B2 (en) Method, apparatus, and chip for implementing mutually-exclusive operation of multiple threads
CN114244595A (zh) 权限信息的获取方法、装置、计算机设备及存储介质
WO2023051355A1 (zh) 权限检查的方法和电子设备
CN111190735A (zh) 一种基于Linux的片上CPU/GPU流水化计算方法及计算机系统
CN112685148A (zh) 海量终端的异步通信方法、装置、计算机设备和存储介质
CN113220366A (zh) 子应用启动方法、装置、终端及服务器
CN105988941B (zh) 缓存数据处理方法和装置
WO2024037629A1 (zh) 区块链的数据整合方法、装置、计算机设备及存储介质
WO2021121161A1 (zh) 进程管理方法和装置、电子设备
CN116721007B (zh) 任务控制方法、系统及装置、电子设备和存储介质
CN115981892A (zh) 日志读取方法、装置、电子设备及存储介质
CN115061743A (zh) 接口调用方法及装置、计算机可读介质和电子设备
CN111901561B (zh) 监控系统中的视频数据处理方法、装置、系统及存储介质
JP2018505489A (ja) システムオンチップにおける動的メモリ利用
CN114610661B (zh) 数据处理装置、方法和电子设备
CN116737330B (zh) 任务处理方法和电子设备
WO2021135763A1 (zh) 数据处理方法和装置、存储介质及电子装置
CN113342837B (zh) 数据发送方法、装置、电子设备和计算机可读介质
WO2023198103A1 (zh) 进程间通信方法和电子设备
CN116701101A (zh) 一种SensorHUB的异常检测方法和电子设备
CN116319726A (zh) 基于远程存储协议的数据处理方法、装置、设备及介质
CN117667825A (zh) 远程直接内存访问方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21756886

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21756886

Country of ref document: EP

Kind code of ref document: A1