WO2023098467A1 - Procédé d'analyse vocale, dispositif électronique, support de stockage lisible et système de puce - Google Patents

Procédé d'analyse vocale, dispositif électronique, support de stockage lisible et système de puce Download PDF

Info

Publication number
WO2023098467A1
WO2023098467A1 PCT/CN2022/131980 CN2022131980W WO2023098467A1 WO 2023098467 A1 WO2023098467 A1 WO 2023098467A1 CN 2022131980 W CN2022131980 W CN 2022131980W WO 2023098467 A1 WO2023098467 A1 WO 2023098467A1
Authority
WO
WIPO (PCT)
Prior art keywords
terminal device
information
voice
application program
interface
Prior art date
Application number
PCT/CN2022/131980
Other languages
English (en)
Chinese (zh)
Inventor
张腾
王斌
孙峰
庄効谚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023098467A1 publication Critical patent/WO2023098467A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/18Status alarms
    • G08B21/24Reminder alarms, e.g. anti-loss alarms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the field of terminal technology, and in particular to a voice analysis method, electronic equipment, a readable storage medium and a chip system.
  • terminal devices can not only perform actions based on received user-triggered clicks and other operations, but also detect user voices through voice assistants, and perform actions based on user voices.
  • the terminal device can detect the voice command issued by the user through the voice assistant, and analyze the voice command issued by the user in combination with the interface content displayed on the current interface of the terminal device, determine the user intention corresponding to the voice command, and then control the End devices perform actions that match user intent.
  • the terminal device cannot accurately understand the user's intention in some scenarios, and may trigger wrong operations or ask the user repeatedly, resulting in low interaction efficiency between the terminal device and the user.
  • the present application provides a speech analysis method, an electronic device, a readable storage medium, and a chip system, which solve the problem of low interaction efficiency between a terminal device and a user in certain scenarios in the prior art.
  • a speech analysis method including:
  • the voice instructions are used to instruct the terminal device to perform operations, and the information sent by the application includes reminder information for reminding the user;
  • the user intention corresponding to the voice command is determined.
  • the accuracy of determining the user's intention corresponding to the voice command can be improved, thereby improving The efficiency of voice interaction between the terminal device and the user.
  • the method before determining the user intention corresponding to the voice command according to the voice command and the reminder information, the method further includes:
  • the first application list is a list of applications installed on the terminal device
  • the second application list is an application currently running on the terminal device list of
  • the first application program list and the second application program list determine the identification of the application program corresponding to the voice instruction and the running state of the application program
  • the determining the user intention corresponding to the voice command according to the voice command and the reminder information includes:
  • the running state of the application program is running in the background, then according to the reminder information, the voice command and the identification of the application program, determine the user intention corresponding to the voice command;
  • the interface information corresponding to the current interface is obtained, and according to the voice command, the reminder information and the interface information, determine The user intention corresponding to the voice instruction.
  • the application program list and the second application program list determine the application program corresponding to the voice command, thereby determine the running state of the application program, and then determine the user corresponding to the voice command in different ways according to different running states Intent, which increases the flexibility to determine user intent.
  • the interface information of the application program can be continuously obtained, so that the user intention corresponding to the voice command can be determined according to the voice command and reminder information, combined with the acquired interface information, which can improve the determination of the user's intention. accuracy of intent.
  • the acquiring interface information corresponding to the current interface according to the current interface of the application program includes:
  • the interface content is analyzed to obtain interface information corresponding to the application program.
  • the acquisition of voice commands and information sent by running applications includes:
  • the workload of obtaining the information sent by the application program can be reduced, and the efficiency of obtaining the voice command and the information sent by the application program can be improved. Variety and flexibility.
  • the acquisition of the information sent by the running application includes:
  • the information sent by the running application program is obtained in real time.
  • the acquired information can be combined in time to determine the user's intention corresponding to the voice command, thereby improving the efficiency of determining the user's intention and improving the acquisition of information sent by the application program variety and flexibility.
  • the acquisition of the information sent by the running application includes:
  • the audio data is converted by using the automatic speech recognition technology ASR to obtain information in the form of text sent by the application program.
  • the audio data sent by the application program is obtained, and the audio data is converted to obtain the information in the form of text sent by the application program, which can improve the flexibility of obtaining the information sent by the application program.
  • the acquisition of the information sent by the running application includes:
  • the text data sent by the application program is extracted through a preset interface to obtain information in text form sent by the application program.
  • the efficiency of obtaining the information sent by the application program can be improved, and the flexibility of obtaining the information sent by the application program can be improved.
  • the method further included:
  • the determining the user intention corresponding to the voice command according to the voice command and the reminder information includes:
  • the user intention corresponding to the voice instruction is determined according to the text instruction and the reminder information.
  • the ASR technology is used to convert the voice instruction to obtain a text instruction in text form, including:
  • the ASR technology is used to convert the denoised speech instruction to obtain the text instruction in text form.
  • the accuracy of converting the text command can be improved, thereby improving the accuracy of determining the user's intention.
  • the method before acquiring the voice instruction and the information sent by the running application, the method further includes :
  • sample data establish a variety of associations between different types of sample data
  • the various types of sample data include: sample reminder information, sample interface content, sample voice instructions, and sample user intentions, and various types of associations Including: the relationship between the sample user intention and the sample reminder information, the relationship between the sample user intention and the sample voice instruction, the relationship between the sample user intention and the sample interface content connection relation;
  • the fusion model is obtained by performing training according to various association relationships, and the fusion model is a single model or a model group composed of multiple models.
  • determining the user intention corresponding to the voice instruction according to the voice instruction and the reminder information include:
  • the user intention corresponding to the voice command is determined by combining the voice command and the reminder information through the fusion model.
  • the acquired voice instruction and reminder information are analyzed by the fusion model, and the user intention output by the fusion model matching the voice instruction and reminder information is obtained, which can improve the accuracy of determining the user intention.
  • the method further includes:
  • the intent execution interface is invoked to perform an operation matching the user intent.
  • the method is applied in a multi-device scenario, and the multi-device scenario includes the first terminal device and a second terminal device, the first terminal device is connected to the second terminal device;
  • the acquisition of voice commands and information issued by running applications includes:
  • the first terminal device obtains the voice instruction and the information sent by the application program run by the first terminal device;
  • the first terminal device sends an information request instruction to the second terminal device according to the voice instruction, and the information request instruction is used to instruct the second terminal device to obtain and feed back the first terminal device to the first terminal device.
  • Information sent by the application program running on the terminal device is used to instruct the second terminal device to obtain and feed back the first terminal device to the first terminal device.
  • the first terminal device receives the information sent by the running application fed back by the second terminal device.
  • Any terminal device that collects voice commands in a multi-device scenario can control other devices in the multi-device scenario according to the voice commands, which can improve the flexibility of controlling terminal devices with voice commands.
  • a speech analysis device including:
  • the first acquiring module is configured to acquire voice instructions and information issued by a running application, the voice instructions are used to instruct the terminal device to perform operations, and the information issued by the application includes reminder information for reminding the user;
  • the first determining module is configured to determine the user intention corresponding to the voice command according to the voice command and the reminder information.
  • the device further includes:
  • the second acquiring module is configured to acquire a first application list and a second application list, the first application list is a list of applications installed on the terminal device, and the second application list is the A list of applications currently running on the terminal device;
  • a second determining module configured to determine the identifier of the application program corresponding to the voice instruction and the running state of the application program according to the first application program list and the second application program list;
  • the first determining module is specifically configured to, if the running state of the application program is running in the background, determine the corresponding voice command according to the reminder information, the voice command and the identifier of the application program.
  • User intention if the running state of the application is running in the foreground, then according to the current interface of the application, obtain the interface information corresponding to the current interface, and according to the voice command, the reminder information and the interface information to determine the user intent corresponding to the voice command.
  • the first determining module is further specifically configured to extract the current interface to obtain the current The interface content included in the interface; the interface content is analyzed to obtain the interface information corresponding to the application program.
  • the first acquiring module is specifically configured to acquire the voice instruction at the first moment; according to At the first moment, information sent by each of the application programs running within a preset time before the first moment is acquired.
  • the first acquisition module is specifically configured to acquire the running Information sent by the application.
  • the first acquisition module is further specifically configured to acquire the terminal device through a preset interface The broadcast audio data; the automatic speech recognition technology ASR is used to convert the audio data to obtain the information in the form of text sent by the application program.
  • the first acquisition module is further specifically configured to: The interface extracts the text data sent by the application program, and obtains the information in the form of text sent by the application program.
  • the device further includes:
  • a conversion module configured to convert the voice command using ASR technology to obtain a text command in text form
  • the first determination module is further specifically configured to determine the user intention corresponding to the voice command according to the text command and the reminder information.
  • the conversion module is specifically configured to use voice enhancement technology to denoise the voice instruction to obtain the denoised A noised voice command: using the ASR technology to convert the denoised voice command to obtain the text command in text form.
  • the device further includes:
  • the establishment module is used to establish multiple association relationships between different types of sample data according to multiple sample data.
  • the various sample data include: sample reminder information, sample interface content, sample voice instructions, and sample user intentions.
  • the association relationship includes: the association relationship between the sample user intention and the sample reminder information, the association relationship between the sample user intention and the sample voice instruction, the sample user intention and the sample Relationships between interface contents;
  • the training module is used to perform training according to the various association relationships to obtain a fusion model, and the fusion model is a single model or a model group composed of multiple models.
  • the first determination module is further specifically configured to use the fusion model to combine the voice instruction and The reminder information determines the user intention corresponding to the voice instruction.
  • the device further includes:
  • An executing module configured to call an intent execution interface according to the user intent, and execute an operation matching the user intent.
  • the apparatus is applied in a multi-device scenario, and the multi-device scenario includes the first terminal device and a second terminal device, the first terminal device is connected to the second terminal device;
  • the first obtaining module is also specifically used for the first terminal device to obtain voice commands and information issued by applications run by the first terminal device; and send information to the second terminal device according to the voice commands request instruction, and then receive the information sent by the running application fed back by the second terminal device, the information request instruction is used to instruct the second terminal device to obtain and feed back the first terminal device 2.
  • an electronic device including: a processor, the processor is configured to run a computer program stored in a memory, so as to implement the speech analysis method described in any one of the above first aspects.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the speech analysis method described in any one of the above-mentioned first aspects is implemented .
  • a chip system in a fifth aspect, includes a memory and a processor, and the processor executes the computer program stored in the memory, so as to realize the speech analysis as described in any one of the above-mentioned first aspects method.
  • FIG. 1A is a schematic interface diagram of a shopping application program provided by an embodiment of the present application.
  • FIG. 1B is a schematic interface diagram of a system setting provided by the embodiment of the present application.
  • FIG. 1C is a schematic interface diagram of a map application program provided by the embodiment of the present application.
  • FIG. 2 is a schematic diagram of a speech analysis scene involved in a speech analysis method provided by an embodiment of the present application
  • FIG. 3 is a schematic flow chart of a speech analysis method provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of obtaining reminder information based on a software architecture provided by an embodiment of the present application
  • FIG. 5 is a schematic flow diagram of another method of acquiring reminder information based on the software architecture provided by the embodiment of the present application.
  • FIG. 6 is a schematic flow diagram of a multi-device voice analysis provided by an embodiment of the present application.
  • FIG. 7 is a structural block diagram of a speech analysis device provided by an embodiment of the present application.
  • FIG. 8 is a structural block diagram of another speech analysis device provided by an embodiment of the present application.
  • FIG. 9 is a structural block diagram of another speech analysis device provided by an embodiment of the present application.
  • FIG. 10 is a structural block diagram of another speech analysis device provided by an embodiment of the present application.
  • FIG. 11 is a structural block diagram of another speech analysis device provided by the embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 13 is a software structural block diagram of an electronic device according to an embodiment of the present application.
  • the voice assistant of the terminal device can be turned on.
  • the terminal device can collect the voice commands issued by the user through the voice assistant, and analyze the user's voice command to obtain the information included in the voice command and determine the corresponding voice command.
  • user intent For example, the terminal device can turn on or off the air conditioner, stereo, and light through the voice assistant.
  • the voice assistant can determine the user's intention corresponding to the voice command based on the interface content displayed on the current interface. For example, referring to Figure 1A, if the current interface of the terminal device is the interface of a shopping application program, the voice assistant can perform shopping operations according to the voice command; referring to Figure 1B, if the current interface of the terminal device is the The voice command executes the operation of turning on Bluetooth.
  • the voice assistant still cannot determine the user's intention after combining the interface content displayed on the current interface, the voice assistant no longer controls the terminal device to perform the operation corresponding to the voice command, or the voice assistant needs to ask the user to obtain more information, Therefore, the user intention corresponding to the voice command is determined according to the content of the user's answer.
  • the map application reminds the user that "a faster route is currently found, and the destination can be reached 5 minutes earlier". If the voice assistant detects the voice command of "switching routes" issued by the user, but the voice assistant cannot determine the corresponding user intention according to the voice command, the voice assistant will no longer control the terminal device to perform the operation of switching routes on map applications, Alternatively, the voice assistant needs to ask the user again.
  • both induction cookers are working, and one of the induction cookers reminds the user that "the fire has been on high for 20 minutes, it is recommended to lower the temperature and simmer slowly”. If the voice assistant detects that the voice command issued by the user is "adjust the temperature to 200 degrees", the voice assistant can control the two induction cookers to be adjusted to 200 degrees, causing the device to perform incorrect operations.
  • this application proposes a voice analysis method.
  • the voice assistant obtains various information sent by the application program, and reminds the user according to the reminder information in the various information, based on the voice command issued by the user, combined with the current interface.
  • the content of the interface can accurately determine the user's intent corresponding to the voice command, control the terminal device to perform operations that match the user's intent, and improve the efficiency of the user's interaction with the voice assistant.
  • FIG. 2 is a schematic diagram of a voice analysis scenario involved in a voice analysis method provided in an embodiment of the present application.
  • the voice analysis scenario may include at least one terminal device 210, wherein each terminal device 210 may be located in the same network
  • each terminal device may include various types of devices such as smart TV, smart audio, router, and projector, and the embodiment of the present application does not limit the type of the terminal device.
  • the terminal device 210 can turn on the voice assistant during operation, and use the voice assistant to obtain voice instructions issued by the user and various information issued by the currently running application program, and then According to the voice command and the reminder information in the various information, the user's intention corresponding to the voice command is determined.
  • the voice assistant when the voice assistant detects that the user has issued a voice command, the voice assistant can analyze the voice command, and at the same time, obtain the information sent by the application program within a preset time, and remind the user according to the reminder information in the sent information , combined with the interface content displayed on the current interface of the terminal device, the fusion model obtained through pre-training is used to output the user intention corresponding to the voice command, so that the terminal device 210 can be controlled by the voice assistant to perform operations matching the user intention.
  • the voice assistant can also obtain the information issued by the map application program, and based on the "find more information” in various information. "fast route” reminder information, combined with the interface currently displayed by the map application program, it can be determined that the user's intention corresponding to the voice command is to switch the navigation route to the found faster route.
  • the voice assistant can control the terminal device to switch the navigation route in the map application program.
  • the voice assistant after the voice assistant detects that the user issued a voice command of "adjust the temperature to 200 degrees", it can obtain the information sent by each induction cooker, and according to the "suggested lowering temperature" issued by one of the induction cookers, Temperature" reminder message, the voice assistant can determine the user's intention corresponding to the voice command, and adjust the temperature of the induction cooker that sends the reminder message to 200 degrees, then the voice assistant can control the induction cooker to lower the temperature to 200 degrees.
  • the voice assistant first detects the voice command issued by the user, and then determines the user's intention according to the reminder information in the obtained various information.
  • the voice assistant can also obtain various information including reminder information issued by the application program in real time, and then when it detects the voice command issued by the user, it can combine the reminder information in the various information to determine the user's intention.
  • the embodiment of the present application does not limit the sequence of obtaining the reminder information and obtaining the voice instruction.
  • the voice instruction involved in the embodiment of the present application may be an instruction for instructing the terminal device to perform an operation.
  • the voice command is used to instruct the terminal device to purchase commodities, to enable or disable a certain function, or to adjust the status of the terminal device.
  • the embodiment of the present application does not limit the operation performed by the voice command to instruct the terminal device.
  • the application program can send various types of information during running, which may include reminder information for reminding the user.
  • the voice assistant can obtain various information sent by the application program in various ways.
  • the voice assistant can intercept the information sent by the application program through the preset interface; it can also receive various information sent by the application program to the voice assistant actively, and can also obtain the information sent by the application program in other ways.
  • the type of information sent by the program and the method of obtaining the information sent by the application program are not limited.
  • the speech analysis scenario including multiple devices is for convenience of explanation.
  • the speech analysis method can be applied in many different speech analysis scenarios, such as the scene of controlling smart home devices through voice assistants, The scenario where the assistant controls the on-vehicle device and other scenarios where the terminal device is controlled by the voice assistant, the embodiment of the present application does not limit the voice analysis scenario.
  • Fig. 3 is a schematic flowchart of a speech analysis method provided by the embodiment of the present application. As an example but not a limitation, the method can be applied to the above-mentioned terminal device. Referring to Fig. 3, the method includes:
  • Step 301 perform training according to various sample data to obtain a fusion model.
  • the fused model may be a single model, or a model group composed of multiple models, and each model in the model group may work together.
  • the embodiment of the present application does not limit the number of models in the fused model.
  • the fusion model is a model group
  • the first model in the model group can determine the domain to which the voice command belongs based on the received voice command, reminder information, interface content and other information ; Afterwards, information such as voice instructions, reminder information, and interface content can be input to the model corresponding to the field in the model group, and the user intention output by the model can be obtained.
  • the terminal device can perform voice interaction with the user through the voice assistant, and the voice assistant can analyze the voice command issued by the user through the fusion model obtained in advance, and determine the user intention corresponding to the voice command, so as to control the terminal device to implement the user's intention. matching operations. Therefore, before the terminal device performs voice interaction with the user, the terminal device can perform training according to various sample data to obtain a fusion model.
  • the various sample data may include: sample reminder information, sample interface content, sample voice instructions, and sample user intentions.
  • the sample reminder information can be the information that the application program reminds the user.
  • the sample interface content can be the content corresponding to each interface displayed by the terminal device when the application program is started.
  • the sample voice command can be the voice issued by the user to the terminal device.
  • User intentions can be different operations performed by the terminal device according to voice instructions, interface content, and reminder information.
  • the terminal device may establish an association relationship between different types of sample data according to various types of sample data, so that the various established association relationships may be used. Then various sample data such as sample reminder information, sample interface content, and sample voice commands in various association relationships are input into the preset initial model to obtain the initial user intention output by the initial model. Afterwards, the terminal device can adjust the initial model according to the initial user intent output by the initial model, combined with the sample user intent corresponding to the multiple sample data input into the initial model in various association relationships, and repeat the above process until the training obtains The model can more accurately output sample user intentions corresponding to various sample data such as sample reminder information, sample interface content, and sample voice commands.
  • the terminal device can establish an association relationship with different types of sample data based on various types of sample data and combined with detected user-triggered operations, that is, the terminal device can associate any sample user intention with other types of user intentions.
  • a certain sample data in the sample data establishes an association relationship.
  • the terminal device may establish an association relationship between the sample user intention and other three types of sample data for each sample user intention. That is, for each sample user intention, an association relationship between the sample user intention and at least one sample reminder information may be established, an association relationship between the sample user intention and at least one sample voice instruction may also be established, or An association relationship between the sample user intention and at least one sample interface content is established, so as to obtain an association relationship between various sample data.
  • the terminal device can associate the sample user intention "switch to a faster route” with the sample reminder information "currently found a faster route, and can arrive at the destination 5 minutes earlier", or the sample user intention "switch to a faster route”.
  • Route is associated with the sample voice instruction "switch route”
  • the sample user intention "switch to a faster route” can also be associated with the interface corresponding to the map application in the sample interface content.
  • the terminal device After the terminal device establishes various association relationships based on part of the sample data (such as 80%), the terminal device can use the multiple association relationships as tag data, and a large amount of sample data (such as sample reminder information, sample interface content, and sample voice instruction) into the pre-set initial model to obtain the initial user intent output by the initial model, and then compare the difference between the sample user intent corresponding to the sample data and the initial user intent according to the association relationship as the label data, so as to improve the initial model. Parameters are adjusted. After the initial model is trained multiple times in the above manner, so that the initial model can more accurately output sample user intentions corresponding to various sample data, the training of the initial model is completed and a fusion model is obtained.
  • sample data such as sample reminder information, sample interface content, and sample voice instruction
  • the terminal device can also use the remaining sample data (such as 20%) in a large number of sample data to test the fusion model to determine the similarity between the initial user intent output by the fusion model and the sample user intent. degree, so that it can be determined whether to further train the fusion model according to the obtained similarity degree.
  • the fusion model be obtained through terminal equipment training based on a large number of sample data, but also the server can be trained based on sample data to obtain the fusion model. No limit.
  • sample data may be obtained through manual collection, and of course may also be collected through other methods, and the embodiment of the present application does not limit the collection method of sample data.
  • the reminder information used to remind the user in the application program may be intercepted manually.
  • the terminal device can add the fusion model to the preset voice assistant, so that the voice assistant can determine the user's intention through the fusion model; the terminal device can also set an interface corresponding to the voice assistant, so that the voice assistant
  • the fusion model can be invoked through this interface, and the embodiment of the present application does not limit the manner in which the voice assistant and the fusion model work together.
  • Step 302 acquiring the information sent by the currently running application program.
  • the application program may send various types of information during the running process, and the various types of information sent by the application program may include reminder information, and the reminder information is information for reminding the user.
  • the reminding information may be obtained through voice conversion broadcast by the application program, or may be obtained from text information of the application program, and the embodiment of the present application does not limit the reminding information.
  • the terminal device can start different applications during operation, and each application can send out various information including reminder information, and the voice assistant can obtain various information sent by the application, so that in the next step, the voice assistant
  • the user's intention corresponding to the voice command can be determined according to the reminder information in various information and combined with the voice command issued by the user.
  • the fusion model trained in step 301 is trained based on a large number of sample reminder information, and the sample reminder information used for training the fusion model is manually collected. Therefore, the fusion model can recognize the reminder information in the various information of the application program acquired by the voice assistant, so that in the subsequent steps, the user's intention can be output according to the reminder information. However, the fusion model does not recognize other information in the various information except the reminder information. When the voice assistant inputs other information in the various information to the fusion model, the fusion model cannot output the user's intention, but outputs an abnormality (such as fusion model Can output empty or output as other).
  • an abnormality such as fusion model Can output empty or output as other.
  • the terminal device can start the voice assistant after startup, and the voice assistant can monitor the various running applications and obtain various information sent by the application (for example, the various information sent by the application to intercept).
  • the application program can remind the user of the reminder information corresponding to the changing scene in the form of voice broadcast according to the constantly changing scene where the terminal device is located.
  • the voice assistant can obtain the audio data of the reminder information through the preset interface when the application program broadcasts the reminder information, and then use the automatic speech recognition technology (automatic speech recognition, ASR) to convert the audio data into text, and get App reminder message.
  • ASR automatic speech recognition
  • the map application at the APP layer can use the media player in the system layer according to the received traffic information and the current location of the terminal device Broadcast the audio data of "find a faster route", and the voice assistant can obtain the audio data from the player through the pre-set interface, and convert the audio data to obtain the text corresponding to the audio data "discover a faster route", That is, to get the reminder information of the map application program.
  • the application program does not enable the function of voice broadcast or the application program does not have the function of voice broadcast, and the application program displays text information to the user Make a reminder.
  • the voice assistant can obtain the reminder information of the application program in text form through the pre-set interface.
  • the voice assistant can also obtain reminder information in other ways.
  • the voice assistant can obtain the text information corresponding to the broadcast audio data in the application program through a preset interface, so that the obtained text information can be used as the reminder information of the application program.
  • the application when the application detects the need to remind the user, the application can also actively send the audio data or text information to be broadcast to the voice assistant, and the voice assistant does not need to obtain the audio data or text information again. That is, referring to FIG. 5 , the application program can complete the action of sending reminder information to the voice assistant at the APP layer, and the voice assistant can obtain the reminder information without going through the player at the system layer.
  • the application program can continue to broadcast audio data or display text information to the user, or can broadcast audio data or text information through the voice assistant. No limit.
  • the terminal device may also pre-train a recognition model for recognizing reminder information based on a large number of samples of reminder information collected manually, combined with other information of the application program. Afterwards, the terminal device can use the recognition model to recognize various information obtained by the voice assistant, so as to obtain reminder information.
  • the voice assistant can also determine the reminder information sent by the application program in other ways, and the embodiment of the present application does not limit the method of obtaining the reminder information.
  • Step 303 Obtain the voice command issued by the user, and convert the voice command to obtain a text command.
  • the voice instruction is an instruction for instructing the terminal device to perform an operation.
  • the user may issue a voice command according to the reminder information, instructing the terminal device or the application program to perform related operations.
  • the terminal device can collect the voice commands issued by the user through the voice assistant, and convert them into text commands.
  • the pre-set voice assistant can be turned on first, so that the voice command issued by the user can be continuously collected through the voice assistant, so that the terminal device can be controlled to perform corresponding operations according to the voice command, and the communication between the terminal device and the user can be improved.
  • the efficiency of voice interaction can be achieved.
  • the voice assistant can call the microphone of the terminal device, collect audio data through the microphone, and obtain voice instructions from the user.
  • the voice assistant can use voice enhancement technology to filter the noise in the voice command to obtain the filtered voice command, that is, the voice uttered by the user.
  • the voice assistant can use ASR to perform text conversion on the filtered voice commands to obtain text commands.
  • the voice assistant can collect voice instructions by means of a wake-up word.
  • the voice assistant can continuously collect audio data through the microphone.
  • the voice assistant can use the audio data within a preset time after the wake-up word as a voice command.
  • the voice assistant can send the wake-up word to The audio data "switch route" within 10 seconds after that is used as a voice command, which is filtered through voice enhancement, and then converted into text through ASR technology to obtain the text command "switch route".
  • the voice assistant can also obtain voice instructions in combination with the end word. After the voice assistant detects the wake-up word, the audio data after the wake-up word can be used as voice instructions until the voice assistant detects the end word.
  • the voice assistant For example, if the wake-up word is "Little E”, the ending word is "Goodbye”, and the audio data collected by the voice assistant is "Little E, switch routes, goodbye", then after the voice assistant detects the wake-up word “Little E", it can The audio data "switching route" that continues to be collected is used as a voice command, and when the voice assistant detects the ending word "goodbye", the voice assistant no longer uses the collected audio data as a voice command.
  • step 302 the terminal device can also execute step 303 first, and then execute step 302, indicating that the voice assistant first collects the voice command issued by the user, and then obtains the information issued by the application program within a preset time, so as to The reminder information sent out is combined with the voice command to determine the user's intention corresponding to the voice command.
  • the voice assistant does not obtain the information sent by the application program in real time, but first collects the voice command issued by the user, and according to the time when the voice command is collected, selects the information sent by the application program within a preset time before that time. , so as to get the reminder information sent by the application.
  • the preset time is 2 minutes. If the voice assistant of the terminal device collects the user's voice command at 11:10, the terminal device can traverse the applications running from 11:08 to 11:10 and obtain the Messages from each application that was run between 11:10 and 11:10.
  • the terminal device may also only perform step 303 without performing step 302.
  • the embodiment of the present application does not limit the sequence of steps 302 and 303, nor does it limit whether the terminal device performs step 302.
  • the user can also issue an instruction according to the current scene, and the terminal device can obtain the voice instruction issued by the user.
  • the map application does not send out a "shorter route found" reminder message, but when the user finds that there is a traffic jam ahead or needs to change the destination, the user can issue a "switching route" to the voice assistant.
  • Directions" or "Change Destination" voice commands are examples of the voice assistant.
  • the voice assistant can also continuously obtain information corresponding to the current scene through the terminal device or the application program, so as to continuously determine the current environment of the user.
  • the voice assistant detects that the user's current environment does not match the status of the terminal device or the status of the application program, the voice assistant can remind the user, so as to determine whether it is necessary to monitor the status of the terminal device or the application program according to the voice command fed back by the user. status is adjusted.
  • the map application may send the traffic jam information to the voice assistant as a reminder.
  • the voice assistant can ask the user "There is a traffic jam on section A, do you want to switch the navigation route?" If the voice assistant detects that the user answers "switch the route", the voice assistant can instruct the map application to switch the navigation route.
  • Step 304 according to the text instruction corresponding to the voice instruction, determine the application program corresponding to the text instruction.
  • the terminal device After acquiring the text instruction, the terminal device can search for an application program corresponding to the text instruction. Because the application program running in the foreground of the terminal device can send reminder information, the application program running in the background may also send reminder information. Therefore, the terminal device can first determine the application program corresponding to the text command according to the text command corresponding to the voice command, so that in the subsequent steps, the terminal device can obtain the interface of the application program according to the determined application program, or, according to the text command and Reminders identify user intent.
  • the terminal device may acquire a first application program list and a second application program list, wherein the first application program list is a list of various application programs installed on the terminal device, and the second application program list is a list of terminal A list of each application currently running on the device, including applications running in the foreground and applications running in the background.
  • the terminal device can search the application associated with the text instruction from the first application list and the second application list according to the first application list and the second application list, combined with the text instruction corresponding to the voice instruction.
  • the terminal device can search the application program matching the text instruction from the first application program list and the second application program list according to the text instruction corresponding to the voice instruction, so as to determine the running status corresponding to the matching application program, wherein the The running state is used to indicate whether the terminal device is running the application program, and whether the terminal device is running the application program in the foreground or running the application program in the background.
  • the terminal device may perform step 305 to obtain interface information of the application program. If the running state indicates that the application program corresponding to the text instruction is an application program running in the background of the terminal device, the terminal device can obtain the identification of the application program, skip step 305 and execute step 306, and combine the identification of the determined application program to pass The fusion model determines the user intent corresponding to the spoken command.
  • the application program related to the text instruction can be an application program with navigation function;
  • the application program related to the instruction can be an application program with the function of playing audio and video data;
  • the text instruction is "open application B", then the application program related to the text instruction can be the same as or similar to the application name in the text instruction application.
  • Step 305 if the application program corresponding to the text instruction is an application program running in the foreground of the terminal device, then acquire interface information according to the interface content of the current interface of the terminal device.
  • the current interface is an interface currently displayed by the terminal device.
  • the interface content may include various elements displayed in the current interface.
  • the interface content may include elements such as text, images, and controls.
  • the interface content may also include: the text position corresponding to the text, the image position corresponding to the image, and the corresponding information such as category and location.
  • the interface information is information determined according to the content of the interface, and is used to indicate the scene where the terminal device is currently located and the actions associated with the current scene (such as the actions currently being performed by the terminal device, and/or the actions that the terminal device may perform Actions).
  • the interface information corresponding to the current interface may include: the scene where the terminal device is currently located is a map navigation scene, the action currently being performed by the terminal device is to navigate to the destination, and the terminal Actions that may be performed by the device include: changing the route, changing the destination, and stopping navigation.
  • the terminal device can obtain the current interface of the terminal device through system services, that is, the interface corresponding to the application program, and identify each element in the content of the interface, and then perform Parse to get the interface information of the application.
  • system services that is, the interface corresponding to the application program
  • Step 306 Determine the user intention corresponding to the voice command according to the information sent by the application program and the text command corresponding to the voice command.
  • the user intention may be obtained by outputting the fusion model trained according to step 301 .
  • the voice assistant of the terminal device can input the information issued by the application program, the text command corresponding to the voice command, and the identification corresponding to the application program to the fusion model, and obtain the user information output by the fusion model.
  • the voice assistant can input the information sent by the application program, the text command corresponding to the voice command, and the interface information corresponding to the application program to the fusion model, and obtain the user intention output by the fusion model.
  • the terminal device after the terminal device obtains the interface information of the application program, it can input the text instruction, interface information and the information issued by the obtained application program at the same time.
  • the text instruction, interface information and information sent by the application program are analyzed through the fusion model, and the operation matching the text instruction, interface information and information sent by the application program is output, that is, the operation corresponding to the voice instruction User intent, so that the terminal device can perform matching operations according to the user intent in subsequent steps.
  • the terminal device can first input each information into the first model in the model group, analyze each information through the first model, and output the domain to which the voice command belongs, and then obtain the domain from the model Among the multiple models in the group, determine the model that matches the domain of the voice command, and input information such as text commands, interface information, and information sent by the application program into the model, and analyze each information through the model to obtain the voice command. corresponding user intent.
  • the terminal device skips step 305 after executing step 304, and sends the text command, the information sent by the application program, and the The identification of the application program obtained in 304 is input into the fusion model, and the user intention output by the fusion model is obtained.
  • the terminal device when the terminal device inputs the information sent by the application program into the fusion model, if the information sent by the application program is obtained by the terminal device first by performing step 302 and then by performing step 303, the terminal device can , select the information sent by the application program within a preset time to input into the fusion model; if the information sent by the application program is obtained by the terminal device by first executing step 303 and then executing step 302, the terminal device can send the information sent by the application program within the preset time. All the information is input into the fusion model.
  • the terminal device may also use other methods to select the information input into the fusion model, which is not limited in this embodiment of the present application.
  • Step 307 perform an operation matching the user's intention.
  • the terminal device After the terminal device determines the user's intention corresponding to the voice command, it can call the preset intention execution interface to control the terminal device to perform operations that match the user's intention, respond to the voice command issued by the user, and complete the voice interaction between the terminal device and the user , so that the processes and steps required for voice interaction between the terminal device and the user can be reduced.
  • step 301 is an optional step, and the terminal device may only perform step 301 once, that is, after the fusion model is obtained through training, there is no need to perform step 301 again each time the user intention corresponding to the voice command is subsequently determined. Train the fusion model.
  • step 305 is also an optional step. If the terminal device determines in step 304 that the application program corresponding to the voice command is an application program running in the foreground of the terminal device, the terminal device can perform step 305; When the corresponding application program is an application program running in the background of the terminal device, the terminal device may skip step 305 and execute step 306 .
  • the above voice analysis method can also be applied in a scenario of multiple devices.
  • it can be applied to the scenario of controlling different smart home devices through the terminal device, can also be applied to the scenario of controlling the vehicle-mounted device through the terminal device, and can also be applied to other scenarios including multiple devices, which is not limited in the embodiment of the present application. .
  • FIG. 6 shows a schematic flow chart of voice analysis performed by multiple devices.
  • the first terminal device and the second terminal device are taken as examples to illustrate a method for any terminal device to access the network for voice analysis.
  • Can include:
  • Step 601. When accessing the network, the first terminal device broadcasts the first application program list to other devices in the network, and requests the second terminal device for the second application program list.
  • the first application program list may be a list of various application programs currently running on the first terminal device.
  • the second application program list may be a list of application programs currently running on the second terminal device.
  • the network accessed by the first terminal device may be a local area network, a wide area network, or the Internet.
  • the first terminal device may access a local area network; in a distributed application scenario, the second A terminal device may access the wide area network, which is not limited in this embodiment of the present application.
  • the first terminal device After the first terminal device detects access to the network, it can generate list request information, obtain the first application program list of the first terminal device, and broadcast the first application program list and list request information to other terminal devices in the network, so that Other terminal devices in the network may receive the first application program list, and feed back a corresponding application program list to the first terminal device according to the list request information.
  • the application programs may be continuously opened or closed, and the first application program list of the first terminal device is constantly changing.
  • the first terminal device detects that there is a newly opened application program, or when an application program is closed, it can update the first application program list, and broadcast the updated first application program list to other devices in the network. program list.
  • Step 602 the second terminal device receives the first application program list of the first terminal device, and feeds back the second application program list of the second terminal device to the first terminal device.
  • Step 603 when the first terminal device detects the voice command sent by the user, it converts the voice command to obtain a text command, and obtains reminder information and interface information of the first terminal device.
  • Step 604 the first terminal device sends an information request instruction to the second terminal device.
  • the information request instruction is used to request reminder information and interface information of the second terminal device.
  • the embodiment of the present application is described by performing step 603 first and then performing step 604 as an example.
  • the first terminal device may also perform step 603 and step 604 at the same time, that is, the first terminal device may also send an information request command to the second terminal device while converting the voice command.
  • the first terminal device may also perform step 604 first, and then perform step 603, and this embodiment of the present application does not limit the order in which the first terminal device performs step 603 and step 604.
  • the first terminal device when the first terminal device first executes step 604 and then executes step 603, the first terminal device does not obtain the reminder information and interface information of the second terminal device according to the detected voice command, but periodically obtains the second terminal device's reminder information and interface information.
  • the reminder information and interface information of the second terminal device use a process similar to the above step 302 to step 303 to obtain the reminder information and interface information of the second terminal device in advance, and when the voice command sent by the user is detected, it can be used in subsequent steps. , to determine the user intent corresponding to the voice command.
  • Step 605 The second terminal device obtains the reminder information and interface information of the second terminal device according to the information request instruction sent by the first terminal device, and feeds back the reminder information and interface information of the second terminal device to the first terminal device.
  • step 603 and step 605 is similar to the process of obtaining information sent by the application program and the process of obtaining interface information in step 302 and step 305, and will not be repeated here.
  • Step 606 the first terminal device determines the application program corresponding to the voice command and the user intention corresponding to the voice command according to the multiple reminder information acquired and combined with the text command.
  • step 602 and step 605 if in step 602 and step 605, the first terminal device acquires the interface information of the application program, the first terminal device may also combine Obtained interface information to determine user intent. However, if in step 602 and step 605, the first terminal device has not obtained the interface information of the application program, then the first terminal device can combine the determined The application identification of the application program determines the user's intent, which will not be repeated here.
  • Step 607 If the application program corresponding to the voice command is an application program run by the second terminal device, the first terminal device sends the user intention to the second terminal device.
  • the first terminal device After the first terminal device determines that the user is aiming at the voice command issued by an application program of the second terminal device, the first terminal device can send the user intention obtained through identification and analysis to the second terminal device, so that the second terminal device can According to the user intention, the intention execution interface is called, and the second terminal device is controlled to perform an operation corresponding to the user intention, so as to realize multi-device cooperative work.
  • Step 608 the second terminal device performs an operation corresponding to the user intention according to the received user intention.
  • Step 609 If the application program corresponding to the voice command is an application program run by the first terminal device, the first terminal device performs an operation corresponding to the user's intention according to the user's intention.
  • step 603, step 605, step 606, step 608, and step 609 the interface information and reminder information are obtained, and the user's intention is determined according to the interface information and reminder information, combined with text instructions, and then the corresponding operation is performed according to the user's intention
  • the process is similar to the process from step 302 to step 306, and will not be repeated here.
  • the embodiment of the present application takes the voice assistant of the terminal device as an example, and some applications installed on the terminal device can also perform voice interaction with the user, so the installed application program can also use the above-mentioned voice analysis method to determine the voice assistant of the user according to the voice command. Intent, to realize the control of terminal equipment or various application programs.
  • the map application can issue voice instructions to the map application through voice interaction, and the map application can switch the navigation route according to the voice instruction , Change the destination, control the on-board equipment or control the vehicle.
  • the voice analysis method can improve the user's intention by acquiring the information sent by the running application program when the voice command is obtained, and using the information sent by the application program as a factor for determining the user's intention.
  • the accuracy of the user's intention corresponding to the voice instruction is determined, so that the efficiency of voice interaction between the terminal device and the user can be improved.
  • the voice command from the user When receiving the voice command from the user, it can convert the voice command to get the text command, and obtain the information from the application program, analyze the text command and the information from the application program through the pre-trained fusion model, output and voice The user intent corresponding to the command.
  • the accuracy of determining the user's intention corresponding to the voice command can be improved, thereby improving the efficiency of voice interaction between the terminal device and the user.
  • the terminal device can also obtain the interface information of the application program, and the fusion model combines the interface information of the application program on the basis of the text command and the information sent by the application program.
  • the user's intention corresponding to the voice command can be determined more accurately, and the accuracy of determining the user's intention can be improved.
  • any terminal device that has collected voice commands in a multi-device scenario can control other devices in the multi-device scenario according to the voice commands, which can improve the flexibility of controlling terminal devices with voice commands.
  • applications with voice interaction functions can also determine the user intent corresponding to the voice command based on the voice command, combined with reminder information and interface information, which can improve the versatility and flexibility of the application to analyze voice commands and obtain user intent.
  • FIG. 7 is a structural block diagram of a speech analysis device provided by the embodiment of the present application. For the convenience of description, only the parts related to the embodiment of the present application are shown.
  • the device includes:
  • the first acquiring module 701 is configured to acquire a voice command and information sent by a running application, the voice command is used to instruct the terminal device to perform an operation, and the information sent by the application includes reminder information for reminding the user;
  • the first determining module 702 is configured to determine the user intention corresponding to the voice command according to the voice command and the reminder information.
  • the device also includes:
  • the second obtaining module 703 is configured to obtain a first application program list and a second application program list, the first application program list is a list of applications installed on the terminal device, and the second application program list is the current application program list of the terminal device list of running applications;
  • the second determining module 704 is configured to determine the identifier of the application program corresponding to the voice instruction and the running state of the application program according to the first application program list and the second application program list;
  • the first determination module 702 is specifically configured to determine the user intention corresponding to the voice command according to the reminder information, the voice command, and the application program identifier if the running state of the application program is running in the background; if the application program If the running state of the program is running in the foreground, according to the current interface of the application program, the interface information corresponding to the current interface is obtained, and the user intention corresponding to the voice command is determined according to the voice command, the reminder information and the interface information.
  • the first determination module 702 is also specifically configured to extract the current interface to obtain interface content included in the current interface; analyze the interface content to obtain interface information corresponding to the application program.
  • the obtaining first obtaining module 701 is specifically configured to obtain the voice command at the first moment; according to the first moment, obtain the information issued by each of the application programs running within a preset time before the first moment .
  • the first acquiring module 701 is specifically configured to acquire information sent by the running application program in real time.
  • the first obtaining module 701 is also specifically configured to obtain the audio data broadcast by the terminal device through a preset interface; use the automatic speech recognition technology ASR to convert the audio data to obtain the text sent by the application program form information.
  • the first obtaining module 701 is also specifically configured to extract the text data sent by the application program through a preset interface to obtain information in text form sent by the application program.
  • the device also includes:
  • the conversion module 705 is used to convert the voice command by using ASR technology to obtain a text command in text form
  • the first determining module 702 is further specifically configured to determine the user intention corresponding to the voice instruction according to the text instruction and the reminder information.
  • the conversion module 705 is specifically used to denoise the voice command by using voice enhancement technology to obtain a denoised voice command; use the ASR technology to convert the denoised voice command to obtain a text form of this text instruction.
  • the device also includes:
  • the establishment module 706 is used to establish various associations between different types of sample data according to various sample data, and the various sample data include: sample reminder information, sample interface content, sample voice instructions, and sample user intentions, and more
  • the association relationship includes: the association relationship between the sample user intention and the sample reminder information, the association relationship between the sample user intention and the sample voice command, and the association relationship between the sample user intention and the sample interface content ;
  • the training module 707 is configured to perform training according to various association relationships to obtain a fusion model, where the fusion model is a single model or a model group composed of multiple models.
  • the first determination module 702 is further specifically configured to determine the user intention corresponding to the voice command through the fusion model, in combination with the voice command and the reminder information.
  • the device also includes:
  • the execution module 708 is configured to invoke an intent execution interface according to the user intent, and execute an operation matching the user intent.
  • the apparatus is applied in a multi-device scenario, where the multi-device scenario includes a first terminal device and a second terminal device, and the first terminal device is connected to the second terminal device;
  • the first obtaining module 701 is also specifically used for the first terminal device to obtain voice commands and information sent by the application program running on the first terminal device; according to the voice command, send an information request command to the second terminal device,
  • the information request instruction is used to instruct the second terminal device to obtain and feed back to the first terminal device the information issued by the application program running on the second terminal device; Information.
  • the speech analysis device by obtaining the information sent by the running application program when the voice command is obtained, and using the information sent by the application program as a factor for determining the user's intention, can improve The accuracy of the user's intention corresponding to the voice instruction is determined, so that the efficiency of voice interaction between the terminal device and the user can be improved.
  • FIG. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device may include a processor 1210, an external memory interface 1220, an internal memory 1221, a universal serial bus (universal serial bus, USB) interface 1230, a charging management module 1240, a power management module 1241, a battery 1242, an antenna 1, an antenna 2, Mobile communication module 1250, wireless communication module 1260, audio module 1270, speaker 1270A, receiver 1270B, microphone 1270C, earphone interface 1270D, sensor module 1280, button 1290, motor 1291, indicator 1292, camera 1293, display screen 1294, and user An identification module (subscriber identification module, SIM) card interface 1295 and the like.
  • SIM subscriber identification module
  • the sensor module 1280 can include pressure sensor 1280A, gyroscope sensor 1280B, air pressure sensor 1280C, magnetic sensor 1280D, acceleration sensor 1280E, distance sensor 1280F, proximity light sensor 1280G, fingerprint sensor 1280H, temperature sensor 1280J, touch sensor 1280K, ambient light Sensor 1280L, bone conduction sensor 1280M, etc.
  • the structure shown in the embodiment of the present invention does not constitute a specific limitation on the electronic device.
  • the electronic device may include more or fewer components than shown in the illustrations, or combine certain components, or separate certain components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • the processor 1210 may include one or more processing units, for example: the processor 1210 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU) wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit, NPU
  • the controller may be the nerve center and command center of the electronic equipment.
  • the controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 1210 for storing instructions and data.
  • the memory in processor 1210 is a cache memory. This memory may hold instructions or data that processor 1210 has just used or recycled. If the processor 1210 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 1210 is reduced, thereby improving the efficiency of the system.
  • processor 1210 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transmitter (universal asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface, and /or universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input and output
  • subscriber identity module subscriber identity module
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (serial data line, SDA) and a serial clock line (derail clock line, SCL).
  • processor 1210 may include multiple sets of I2C buses.
  • the processor 1210 can be respectively coupled to the touch sensor 1280K, the charger, the flashlight, the camera 1293 and the like through different I2C bus interfaces.
  • the processor 1210 may be coupled to the touch sensor 1280K through the I2C interface, so that the processor 1210 and the touch sensor 1280K communicate through the I2C bus interface to realize the touch function of the electronic device.
  • the I2S interface can be used for audio communication.
  • processor 1210 may include multiple sets of I2S buses.
  • the processor 1210 may be coupled to the audio module 1270 through an I2S bus to implement communication between the processor 1210 and the audio module 1270 .
  • the audio module 1270 can transmit audio signals to the wireless communication module 1260 through the I2S interface, so as to realize the function of answering calls through the Bluetooth headset.
  • the PCM interface can also be used for audio communication, sampling, quantizing and encoding the analog signal.
  • the audio module 1270 and the wireless communication module 1260 can be coupled through a PCM bus interface.
  • the audio module 1270 can also transmit audio signals to the wireless communication module 1260 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • a UART interface is generally used to connect the processor 1210 and the wireless communication module 1260 .
  • the processor 1210 communicates with the Bluetooth module in the wireless communication module 1260 through the UART interface to realize the Bluetooth function.
  • the audio module 1270 can transmit audio signals to the wireless communication module 1260 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
  • the MIPI interface can be used to connect the processor 1210 with the display screen 1294, the camera 1293 and other peripheral devices.
  • MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
  • the processor 1210 communicates with the camera 1293 through the CSI interface to realize the shooting function of the electronic device.
  • the processor 1210 communicates with the display screen 1294 through the DSI interface to realize the display function of the electronic device.
  • the GPIO interface can be configured by software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 1210 with the camera 1293 , the display screen 1294 , the wireless communication module 1260 , the audio module 1270 , the sensor module 1280 and so on.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 1230 is an interface conforming to the USB standard specification, specifically, it may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the USB interface 1230 can be used to connect a charger to charge the electronic device, and can also be used to transmit data between the electronic device and peripheral devices. It can also be used to connect headphones and play audio through them. This interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules shown in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the electronic device.
  • the electronic device may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
  • the wireless communication function of the electronic device can be realized by the antenna 1, the antenna 2, the mobile communication module 1250, the wireless communication module 1260, the modem processor and the baseband processor.
  • the wireless communication module 1260 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wi-Fi) network), bluetooth (bluetooth, BT), global navigation satellite system, etc. (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • the wireless communication module 1260 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 1260 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 1210 .
  • the wireless communication module 1260 can also receive the signal to be sent from the processor 1210 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the electronic device is coupled to the mobile communication module 1250, and the antenna 2 is coupled to the wireless communication module 1260, so that the electronic device can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC , FM, and/or IR techniques, etc.
  • GSM global system for mobile communications
  • general packet radio service general packet radio service
  • CDMA code division multiple access
  • WCDMA broadband Code division multiple access
  • time division code division multiple access time-division code division multiple access
  • TD-SCDMA time-division code division multiple access
  • LTE long term evolution
  • BT GNSS
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a Beidou navigation satellite system (beidou navigation satellite system, BDS), a quasi-zenith satellite system (quasi -zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • Beidou navigation satellite system beidou navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device realizes the display function through the GPU, the display screen 1294, and the application processor.
  • the GPU is a microprocessor for image processing, connected to the display screen 1294 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 1210 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 1294 is used to display images, videos and the like.
  • Display 1294 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc.
  • the electronic device may include 1 or N display screens 1294, where N is a positive integer greater than 1.
  • the electronic device can realize the shooting function through ISP, camera 1293 , video codec, GPU, display screen 1294 and application processor.
  • the ISP is used to process the data fed back by the camera 1293 .
  • the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin color.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be located in the camera 1293.
  • Camera 1293 is used to capture still images or video.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the light signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other image signals.
  • the electronic device may include 1 or N cameras 1293, where N is a positive integer greater than 1.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when an electronic device selects a frequency point, a digital signal processor is used to perform Fourier transform on the frequency point energy, etc.
  • Video codecs are used to compress or decompress digital video.
  • An electronic device may support one or more video codecs.
  • the electronic device can play or record video in multiple encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • the NPU is a neural-network (NN) computing processor.
  • NPU neural-network
  • Applications such as intelligent cognition of electronic devices can be realized through NPU, such as: image recognition, face recognition, speech recognition, text understanding, etc.
  • the external memory interface 1220 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device.
  • the external memory card communicates with the processor 1210 through the external memory interface 1220 to implement a data storage function. Such as saving music, video and other files in the external memory card.
  • the internal memory 1221 may be used to store computer-executable program codes including instructions.
  • the processor 1210 executes various functional applications and data processing of the electronic device by executing instructions stored in the internal memory 1221 .
  • the internal memory 1221 may include an area for storing programs and an area for storing data.
  • the stored program area can store an operating system, at least one application program required by a function (such as a sound playing function, an image playing function, etc.) and the like.
  • the storage data area can store data (such as audio data, phone book, etc.) created during the use of the electronic device.
  • the internal memory 1221 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (universal flash storage, UFS) and the like.
  • the electronic device can implement audio functions through the audio module 1270, the speaker 1270A, the receiver 1270B, the microphone 1270C, the earphone interface 1270D, and the application processor. Such as music playback, recording, etc.
  • the audio module 1270 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal.
  • the audio module 1270 may also be used to encode and decode audio signals.
  • the audio module 1270 may be set in the processor 1210 , or some functional modules of the audio module 1270 may be set in the processor 1210 .
  • Loudspeaker 1270A also called “horn" is used to convert audio electrical signals into sound signals.
  • the electronic device can listen to music through speaker 1270A, or listen to hands-free calls.
  • Receiver 1270B also called “earpiece” is used to convert audio electrical signals into audio signals.
  • the receiver 1270B can be placed close to the human ear to receive the voice.
  • Microphone 1270C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a phone call or sending a voice message, the user can put his mouth close to the microphone 1270C to make a sound, and input the sound signal to the microphone 1270C.
  • the electronic device may be provided with at least one microphone 1270C. In other embodiments, the electronic device can be provided with two microphones 1270C, which can also implement a noise reduction function in addition to collecting sound signals. In some other embodiments, the electronic device can also be provided with three, four or more microphones 1270C to realize the collection of sound signals, noise reduction, identification of sound sources, and realization of directional recording functions, etc.
  • the earphone interface 1270D is used for connecting wired earphones.
  • the earphone interface 1270D may be a USB interface 1230, or a 3.5mm open mobile terminal platform (open mobile terminal platform, OMTP) standard interface, or a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 1280A is used to sense the pressure signal and convert the pressure signal into an electrical signal.
  • pressure sensor 1280A may be located on display screen 1294 .
  • pressure sensors 1280A such as resistive pressure sensors, inductive pressure sensors, and capacitive pressure sensors.
  • a capacitive pressure sensor may be comprised of at least two parallel plates with conductive material.
  • the electronic device detects the intensity of the touch operation according to the pressure sensor 1280A.
  • the electronic device may also calculate the touched position according to the detection signal of the pressure sensor 1280A.
  • touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example: when a touch operation with a touch operation intensity less than the first pressure threshold acts on the short message application icon, an instruction to view short messages is executed. When a touch operation whose intensity is greater than or equal to the first pressure threshold acts on the icon of the short message application, the instruction of creating a new short message is executed.
  • the gyroscope sensor 1280B can be used to determine the motion posture of the electronic device. In some embodiments, the angular velocity of the electronic device about three axes (ie, x, y, and z axes) can be determined by the gyro sensor 1280B.
  • the gyro sensor 1280B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 1280B detects the shake angle of the electronic device, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the electronic device through reverse movement to achieve anti-shake.
  • the gyroscope sensor 1280B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 1280C is used to measure air pressure.
  • the electronic device calculates the altitude through the air pressure value measured by the air pressure sensor 1280C to assist positioning and navigation.
  • the magnetic sensor 1280D includes a Hall sensor.
  • the electronic device may detect opening and closing of the flip holster using the magnetic sensor 1280D.
  • the electronic device can detect the opening and closing of the flip according to the magnetic sensor 1280D. Then according to the detected opening and closing state of the holster or the opening and closing state of the flip cover, features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 1280E can detect the acceleration of the electronic device in various directions (generally three axes). When the electronic device is stationary, the magnitude and direction of gravity can be detected. It can also be used to recognize the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
  • Distance sensor 1280F used to measure distance.
  • Electronic devices can measure distance via infrared or laser light. In some embodiments, when shooting a scene, the electronic device can use the distance sensor 1280F for distance measurement to achieve fast focusing.
  • Proximity light sensor 1280G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • the light emitting diodes may be infrared light emitting diodes.
  • Electronic devices emit infrared light outwards through light-emitting diodes.
  • Electronic devices use photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object in the vicinity of the electronic device. When insufficient reflected light is detected, the electronic device may determine that there is no object in the vicinity of the electronic device.
  • the electronic device can use the proximity light sensor 1280G to detect that the user holds the electronic device close to the ear to make a call, so as to automatically turn off the screen to save power.
  • the proximity light sensor 1280G can also be used in leather case mode, automatic unlock and lock screen in pocket mode.
  • the ambient light sensor 1280L is used for sensing ambient light brightness.
  • the electronic device can adaptively adjust the brightness of the display screen 1294 according to the perceived ambient light brightness.
  • the ambient light sensor 1280L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 1280L can also cooperate with the proximity light sensor 1280G to detect whether the electronic device is in the pocket to prevent accidental touch.
  • the fingerprint sensor 1280H is used to collect fingerprints. Electronic devices can use the collected fingerprint features to unlock fingerprints, access application locks, take pictures with fingerprints, answer incoming calls with fingerprints, etc.
  • the temperature sensor 1280J is used to detect temperature.
  • the electronic device uses the temperature detected by the temperature sensor 1280J to implement a temperature treatment strategy. For example, when the temperature reported by the temperature sensor 1280J exceeds a threshold, the electronic device may reduce the performance of a processor located near the temperature sensor 1280J, so as to reduce power consumption and implement thermal protection.
  • the electronic device when the temperature is lower than another threshold, the electronic device heats the battery 1242 to avoid abnormal shutdown of the electronic device caused by low temperature.
  • the electronic device boosts the output voltage of the battery 1242 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 1280K also known as "touch panel”.
  • the touch sensor 1280K can be arranged on the display screen 1294, and the touch sensor 1280K and the display screen 1294 form a touch screen, also called “touch screen”.
  • the touch sensor 1280K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations can be provided through the display screen 1294 .
  • the touch sensor 1280K may also be disposed on the surface of the electronic device, which is different from the position of the display screen 1294 .
  • the bone conduction sensor 1280M can acquire vibration signals.
  • the bone conduction sensor 1280M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 1280M can also contact the human pulse and receive the blood pressure beating signal.
  • the bone conduction sensor 1280M can also be disposed in the earphone, combined into a bone conduction earphone.
  • the audio module 1270 can analyze the voice signal based on the vibration signal of the vibrating bone mass of the vocal part acquired by the bone conduction sensor 1280M, so as to realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 1280M, so as to realize the heart rate detection function.
  • the keys 1290 include a power key, a volume key, and the like. Key 1290 may be a mechanical key. It can also be a touch button.
  • the electronic device can receive key input and generate key signal input related to user settings and function control of the electronic device.
  • the motor 1291 can generate a vibrating prompt.
  • the motor 1291 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback.
  • touch operations applied to different applications may correspond to different vibration feedback effects.
  • the motor 1291 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 1294 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 1292 can be an indicator light, and can be used to indicate charging status, power change, and can also be used to indicate messages, missed calls, notifications, and the like.
  • the SIM card interface 1295 is used for connecting a SIM card.
  • the SIM card can be connected and separated from the electronic device by being inserted into the SIM card interface 1295 or pulled out from the SIM card interface 1295 .
  • the electronic device can support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • SIM card interface 1295 can support Nano SIM card, Micro SIM card, SIM card, etc. Multiple cards can be inserted into the same SIM card interface 1295 at the same time. The types of the multiple cards may be the same or different.
  • the SIM card interface 1295 is also compatible with different types of SIM cards.
  • the SIM card interface 1295 is also compatible with external memory cards.
  • the electronic device interacts with the network through the SIM card to realize functions such as calling and data communication.
  • the electronic device adopts an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the electronic device and cannot be separated from the electronic device.
  • the software system of the electronic device may adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture.
  • the Android system with layered architecture is taken as an example to illustrate the software structure of the electronic device.
  • FIG. 13 is a software structural block diagram of an electronic device according to an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces.
  • the Android system is divided into four layers, which are respectively the application program layer, the application program framework layer, the Android runtime (Android runtime) and the system library, and the kernel layer from top to bottom.
  • the application layer can consist of a series of application packages.
  • the application package may include application programs such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
  • application programs such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer can include window manager, content provider, view system, phone manager, resource manager, notification manager, etc.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make it accessible to applications.
  • Said data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on.
  • the view system can be used to build applications.
  • a display interface can consist of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide communication functions of electronic devices. For example, the management of call status (including connected, hung up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify the download completion, message reminder, etc.
  • the notification manager can also be a notification that appears on the top status bar of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window.
  • prompting text information in the status bar issuing a prompt sound, vibrating the electronic device, and flashing the indicator light, etc.
  • the Android Runtime includes core library and virtual machine. The Android runtime is responsible for the scheduling and management of the Android system.
  • the core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application program layer and the application program framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • a system library can include multiple function modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of various commonly used audio and video formats, as well as still image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing, etc.
  • 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.
  • a corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into original input events (including touch coordinates, time stamps of touch operations, and other information). Raw input events are stored at the kernel level.
  • the application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Take the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon as an example.
  • the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer.
  • Camera 1293 captures still images or video.
  • An embodiment of the present application also provides an electronic device, including: a processor, configured to run a computer program stored in a memory, so as to implement one or more steps in any one of the above methods.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores instructions, and when it is run on a computer or a processor, the computer or the processor executes one of the above-mentioned methods or multiple steps.
  • the embodiment of the present application also provides a computer program product including instructions.
  • the computer program product runs on a computer or a processor, it causes the computer or the processor to perform one or more steps in any one of the above methods.
  • An embodiment of the present application also provides a chip system, the chip system includes a memory and a processor, and the processor executes a computer program stored in the memory to implement one or more steps in any one of the above methods.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted via a computer-readable storage medium.
  • the computer instructions may be transmitted from one web site, computer, server, or data center to another web site, computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, DVD), or a semiconductor medium (for example, a solid state disk (solid state disk, SSD)), etc.
  • the processes can be completed by computer programs to instruct related hardware.
  • the programs can be stored in computer-readable storage media.
  • When the programs are executed may include the processes of the foregoing method embodiments.
  • the aforementioned storage medium includes: ROM or random access memory RAM, magnetic disk or optical disk, and other various media that can store program codes.
  • the disclosed devices and methods may be implemented in other ways.
  • the system embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division.
  • multiple units or components can be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the procedures in the method of the above-mentioned embodiments in the present application can be completed by instructing related hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium.
  • the computer program When executed by a processor, the steps in the above-mentioned various method embodiments can be realized.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form.
  • the computer-readable medium may at least include: any entity or device capable of carrying computer program codes to a terminal device, a recording medium, a computer memory, a read-only memory (ROM, Read-Only Memory), a random-access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media.
  • ROM read-only memory
  • RAM random-access memory
  • electrical carrier signals telecommunication signals
  • software distribution media Such as U disk, mobile hard disk, magnetic disk or optical disk, etc.
  • computer readable media may not be electrical carrier signals and telecommunication signals under legislation and patent practice.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • General Physics & Mathematics (AREA)
  • Telephonic Communication Services (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

La présente demande convient au domaine technique des terminaux et concerne un procédé d'analyse vocale, un dispositif électronique, un support de stockage lisible et un système de puce. Le procédé consiste à : obtenir une instruction vocale et des informations envoyées par un programme d'application en cours d'exécution, l'instruction vocale servant à demander à un dispositif terminal d'exécuter une opération, et les informations envoyées par le programme d'application comprenant des informations de rappel utilisées pour effectuer un rappel auprès d'un utilisateur ; et déterminer une intention d'utilisateur correspondant à l'instruction vocale en fonction de l'instruction vocale et des informations de rappel. En obtenant les informations envoyées par le programme d'application et en utilisant les informations envoyées par le programme d'application comme facteur pour déterminer l'intention de l'utilisateur, il est possible d'améliorer la précision de détermination de l'intention de l'utilisateur correspondant à l'instruction vocale, ainsi que l'efficacité de réalisation d'une interaction vocale entre le dispositif terminal et l'utilisateur.
PCT/CN2022/131980 2021-11-30 2022-11-15 Procédé d'analyse vocale, dispositif électronique, support de stockage lisible et système de puce WO2023098467A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111453243.7 2021-11-30
CN202111453243.7A CN116206602A (zh) 2021-11-30 2021-11-30 语音解析方法、电子设备、可读存储介质及芯片系统

Publications (1)

Publication Number Publication Date
WO2023098467A1 true WO2023098467A1 (fr) 2023-06-08

Family

ID=86510056

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/131980 WO2023098467A1 (fr) 2021-11-30 2022-11-15 Procédé d'analyse vocale, dispositif électronique, support de stockage lisible et système de puce

Country Status (2)

Country Link
CN (1) CN116206602A (fr)
WO (1) WO2023098467A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108806674A (zh) * 2017-05-05 2018-11-13 北京搜狗科技发展有限公司 一种定位导航方法、装置和电子设备
CN108897517A (zh) * 2018-06-27 2018-11-27 联想(北京)有限公司 一种信息处理方法及电子设备
CN109741740A (zh) * 2018-12-26 2019-05-10 苏州思必驰信息科技有限公司 基于外部触发的语音交互方法及装置
CN110866179A (zh) * 2019-10-08 2020-03-06 上海博泰悦臻网络技术服务有限公司 一种基于语音助手的推荐方法、终端及计算机存储介质
CN111949240A (zh) * 2019-05-16 2020-11-17 阿里巴巴集团控股有限公司 交互方法、存储介质、服务程序和设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108806674A (zh) * 2017-05-05 2018-11-13 北京搜狗科技发展有限公司 一种定位导航方法、装置和电子设备
CN108897517A (zh) * 2018-06-27 2018-11-27 联想(北京)有限公司 一种信息处理方法及电子设备
CN109741740A (zh) * 2018-12-26 2019-05-10 苏州思必驰信息科技有限公司 基于外部触发的语音交互方法及装置
CN111949240A (zh) * 2019-05-16 2020-11-17 阿里巴巴集团控股有限公司 交互方法、存储介质、服务程序和设备
CN110866179A (zh) * 2019-10-08 2020-03-06 上海博泰悦臻网络技术服务有限公司 一种基于语音助手的推荐方法、终端及计算机存储介质

Also Published As

Publication number Publication date
CN116206602A (zh) 2023-06-02

Similar Documents

Publication Publication Date Title
WO2021052263A1 (fr) Procédé et dispositif d'affichage d'assistant vocal
RU2766255C1 (ru) Способ голосового управления и электронное устройство
WO2020211701A1 (fr) Procédé de formation de modèle, procédé de reconnaissance d'émotion, appareil et dispositif associés
CN110910872B (zh) 语音交互方法及装置
US11922935B2 (en) Voice interaction method and apparatus, terminal, and storage medium
CN110138959B (zh) 显示人机交互指令的提示的方法及电子设备
CN115866121B (zh) 应用界面交互方法、电子设备和计算机可读存储介质
EP4064284A1 (fr) Procédé de détection de voix, procédé d'apprentissage de modèle de prédiction, appareil, dispositif et support
CN111819533B (zh) 一种触发电子设备执行功能的方法及电子设备
US11868463B2 (en) Method for managing application permission and electronic device
WO2020029306A1 (fr) Procédé de capture d'image et dispositif électronique
WO2021052139A1 (fr) Procédé d'entrée de geste et dispositif électronique
WO2021218429A1 (fr) Procédé de gestion d'une fenêtre d'application, dispositif terminal et support de stockage lisible par ordinateur
CN116233300A (zh) 控制通信服务状态的方法、终端设备和可读存储介质
WO2023207667A1 (fr) Procédé d'affichage, véhicule et dispositif électronique
WO2023071940A1 (fr) Procédé et appareil inter-dispositifs pour synchroniser une tâche de navigation, et dispositif et support de stockage
CN114444000A (zh) 页面布局文件的生成方法、装置、电子设备以及可读存储介质
CN113742460A (zh) 生成虚拟角色的方法及装置
WO2022007757A1 (fr) Procédé d'enregistrement d'empreinte vocale inter-appareils, dispositif électronique et support de stockage
WO2023098467A1 (fr) Procédé d'analyse vocale, dispositif électronique, support de stockage lisible et système de puce
CN114173381A (zh) 数据传输方法和电子设备
CN113867851A (zh) 电子设备操作引导信息录制方法、获取方法和终端设备
WO2023116669A1 (fr) Système et procédé de génération de vidéo, et appareil associé
CN114115772B (zh) 灭屏显示的方法及装置
WO2023109636A1 (fr) Procédé et appareil d'affichage de carte d'application, dispositif terminal et support de stockage lisible

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22900272

Country of ref document: EP

Kind code of ref document: A1