CN117707404A

CN117707404A - Scene processing method, electronic device and storage medium

Info

Publication number: CN117707404A
Application number: CN202310634541.9A
Authority: CN
Inventors: 张家豪; 金永军; 牛翔宇
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-05-31
Filing date: 2023-05-31
Publication date: 2024-03-15

Abstract

The application provides a scene processing method, electronic equipment and a storage medium, and relates to the technical field of computers. The method comprises the following steps: when a touch starting event is detected, a recording starting event is detected within a preset time interval, and the first application and the second application are detected to be consistent, determining that the current scene is a voice input scene; and in the voice input scene, reducing the power consumption parameter of the electronic equipment, or improving the power consumption parameter to a first value. The method for processing the scene does not depend on layout information and Activity information when recognizing the voice input scene, but recognizes the voice input scene according to logic for generating the voice input scene, thereby accurately recognizing the voice input scene. The power consumption of the electronic equipment in the voice input scene is reduced, the loss of each device of the electronic equipment can be reduced while the user experience is not influenced, and the endurance and the service life of the electronic equipment are improved.

Description

Scene processing method, electronic device and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a scene processing method, an electronic device, and a storage medium.

Background

With the rapid development of electronic technology, the popularity of electronic devices such as mobile phones and tablet computers is higher and higher, and with the development of network technology, the application functions of the electronic devices are also more and more abundant.

For example, instant messaging (Instant Messaging, IM) software is installed on current electronic devices, and voice chat functionality supported in the IM software is highly appreciated by users.

In the process of voice chat, power consumption waste is caused by dynamic effect drawing/synthesis during voice input/recording, and the problem is generally solved by reducing power consumption in a voice input scene after the voice input scene is identified. However, in the related art, a voice input scene cannot be accurately recognized, resulting in an inability to accurately reduce power consumption in the voice input scene.

Disclosure of Invention

The present application provides a scene processing method, an electronic device, and a storage medium, which are capable of accurately recognizing a voice input scene by recognizing the voice input scene according to logic for generating the voice input scene without depending on layout information and Activity information when recognizing the voice input scene. The power consumption of the electronic equipment in the voice input scene is reduced, the loss of each device of the electronic equipment can be reduced while the user experience is not influenced, and the endurance and the service life of the electronic equipment are improved.

In a first aspect, the present application provides a scene processing method, including: when a touch starting event is detected, a recording starting event is detected within a preset time interval, and the first application and the second application are detected to be consistent, determining that the current scene is a voice input scene; in a voice input scene, reducing the power consumption parameter of the electronic equipment, or improving the power consumption parameter to a first value, wherein the first value is smaller than a second value, and the second value is the power consumption parameter improved by the electronic equipment when the touch starting event is detected.

The first application is an application corresponding to the touch initiation event. It is understood that the first application is an application that currently receives a touch operation of a user, or the first application is an application that generates the touch initiation event.

The second application is an application corresponding to the recording start event. It is understood that the second application is an application that invokes the recording start interface, or the second application is an application that generates the recording start event.

The current scene is a use scene where the electronic device is currently located.

Optionally, the power consumption parameter includes any one or any combination of a CPU frequency, a screen refresh Rate, a screen brightness, a Double Data Rate (DDR) frequency, a Touch Panel (TP) point count Rate, an application drawing frame Rate, and a system composition frame Rate.

It will be appreciated that when the power consumption parameter comprises any of the above, the power consumption parameter is increased to a first value, where the first value comprises a single value.

When the power consumption parameter comprises any combination of the above, i.e. the power consumption parameter comprises at least two of the above, the power consumption parameter is increased to a first value, where the first value comprises at least two values.

According to the scene processing method provided by the first aspect, when a touch starting event is detected, a recording starting event is detected within a preset time interval, and the first application and the second application are detected to be consistent, the current scene is determined to be a voice input scene; in a voice input scene, reducing the power consumption parameter of the electronic equipment, or improving the power consumption parameter to a first value, wherein the first value is smaller than a second value, and the second value is the power consumption parameter improved by the electronic equipment when the touch starting event is detected.

The logic for generating the voice input scene is that the recording start event occurs after the touch start event occurs, the time interval between the occurrence of the touch start event and the occurrence of the recording start event is within the preset time interval, and the application for generating the touch start event is consistent with the application for generating the recording start event. According to the scene processing method, when the voice input scene is identified, the voice input scene is identified according to the logic for generating the voice input scene without depending on layout information and Activity information, so that the voice input scene can be accurately identified.

Compared with the prior art that the power consumption parameter is improved when the touch initial event is detected, the scene processing method reduces the power consumption parameter of the electronic equipment in the voice input scene, or improves the power consumption parameter to the first value, effectively reduces the power consumption of the electronic equipment in the voice input scene, reduces the loss of each device of the electronic equipment while not affecting the user experience, and improves the endurance and service life of the electronic equipment.

In a possible implementation manner, the scene processing method provided in the application, before determining that the current scene is a voice input scene, further includes: when a touch initial event is detected, recording the first time of the touch initial event; recording a second time of the recording start event when the recording start event is detected; when the difference between the second time and the first time is smaller than the preset time interval, determining that the recording start event is detected in the preset time interval.

In this implementation, the times of the touch start event and the recording start event are recorded, a difference between the two times is calculated and compared with a preset time interval, thereby determining that the recording start event was detected within the preset time interval. Therefore, the touch start event which occurs firstly and the recording start event which occurs later can be proved, the time interval between the occurrence of the touch start event and the occurrence of the recording start event is within the preset time interval, the situation of the voice input scene is met, and the guarantee is provided for accurately identifying the voice input scene subsequently.

In a possible implementation manner, the scene processing method provided in the application, before determining that the current scene is a voice input scene, further includes: acquiring first identification information of a first application; acquiring second identification information of a second application; when the first identification information is detected to be identical to the second identification information, the first application and the second application are determined to be identical.

In this implementation, the first application and the second application are determined to be consistent by comparing the first identification information of the first application with the second identification information of the second application. Therefore, the application of the touch starting event and the application of the recording starting event are the same, the situation of the voice input scene is met, and the guarantee is provided for accurately identifying the voice input scene subsequently.

Optionally, the first identification information may include a first UID and/or a first application package name, and the second identification information may include a second UID and/or a second application package name.

In a possible implementation manner, when the first identification information includes a first UID and the second identification information includes a second UID, determining that the first application and the second application are consistent includes: when the first UID is detected to be identical to the second UID, the first application and the second application are determined to be consistent.

In a possible implementation manner, when the first identification information includes a first application package name and the second identification information includes a second application package name, determining that the first application and the second application are consistent includes: and when the first application package name is detected to be the same as the second application package name, determining that the first application and the second application are consistent.

In a possible implementation manner, when the first identification information includes a first UID and a first application package name, and the second identification information includes a second UID and a second application package name, determining that the first application and the second application are consistent includes: and when the first UID is detected to be the same as the second UID, and the first application package name is detected to be the same as the second application package name, determining that the first application and the second application are consistent.

In a possible implementation manner, the scene processing method provided by the application further includes: after the power consumption parameter of the electronic equipment is reduced in the voice input scene, when the touch end event and/or the recording end event is detected, the power consumption parameter of the electronic equipment is not reduced.

In the implementation manner, when the touch end event and/or the recording end event is detected, the power consumption parameter of the electronic equipment is not reduced, or when the touch end event and/or the recording end event is detected, the mode of adjusting the power consumption parameter in the non-voice input scene is restored, for example, when the touch event is detected, the power consumption parameter is improved, so that the smoothness of the electronic equipment is improved in the non-voice input scene, and better experience is brought to a user.

In a possible implementation manner, the scene processing method provided by the application further includes: after the power consumption parameter is increased to the first value in the voice input scene, when the touch end event and/or the recording end event is detected, the power consumption parameter is not increased to the first value.

In this implementation manner, when the touch end event and/or the recording end event is detected, the power consumption parameter is not increased to the first value, or, when the touch end event and/or the recording end event is detected, the mode of adjusting the power consumption parameter in the non-voice input scene is restored, for example, when the touch event is detected, the power consumption parameter is increased, so that the smoothness of the electronic device is improved in the non-voice input scene, and better experience is brought to the user.

In a possible implementation manner, the electronic device provided by the application includes a touch screen driver, and the scene processing method provided by the application further includes: and detecting a touch start event and/or a touch end event through a touch screen drive or input scheduling thread module.

In the implementation mode, the touch start event and/or the touch end event are detected through the touch screen drive or the input scheduling thread module, so that the mode of detecting the touch start event and/or the touch end event is enriched, and various implementation modes are provided for accurately identifying the voice input scene subsequently.

In a possible implementation manner, the electronic device provided by the application includes a microphone driver, and the scene processing method provided by the application further includes: the recording start event and/or the recording end event is detected by microphone actuation.

In the implementation mode, the microphone is used for driving to detect the recording start event and/or the recording end event, so that the mode of detecting the recording start event and/or the recording end event is enriched, and various implementation modes are provided for accurately identifying the voice input scene subsequently.

In a second aspect, the present application provides an electronic device, the electronic device comprising: one or more processors; one or more memories; a module in which a plurality of application programs are installed; the memory stores one or more programs that, when executed by the processor, cause the electronic device to perform the method of the first aspect and any possible implementation thereof.

In a third aspect, the present application provides a chip comprising a processor. The processor is configured to read and execute a computer program stored in the memory to perform the method of the first aspect and any possible implementation thereof.

Optionally, the chip further comprises a memory, and the memory is connected with the processor through a circuit or a wire.

Optionally, the chip further comprises a communication interface.

In a fourth aspect, the present application provides a computer readable storage medium having stored therein a computer program which, when executed by a processor, causes the processor to perform the method of the first aspect and any possible implementation thereof.

In a fifth aspect, the present application provides a computer program product comprising: computer program code which, when run on an electronic device, causes the electronic device to perform the method of the first aspect and any possible implementation thereof.

The technical effects obtained by the second, third, fourth and fifth aspects are similar to the technical effects obtained by the corresponding technical means in the first aspect, and are not described in detail herein.

Drawings

FIG. 1 is a schematic illustration of a voice chat operation according to an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram of another voice chat operation shown in an exemplary embodiment of the present application;

FIG. 3 is a schematic illustration of yet another voice chat operation shown in an exemplary embodiment of the present application;

FIG. 4 is a schematic illustration of yet another voice chat operation shown in an exemplary embodiment of the present application;

Fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an exemplary embodiment of the present application;

FIG. 6 is a block diagram of a software architecture of an electronic device as shown in an exemplary embodiment of the present application;

fig. 7 is a flow chart of a scene processing method according to an embodiment of the present application;

fig. 8 is a flow chart of another scenario processing method according to an embodiment of the present application;

fig. 9 is a schematic diagram of still another scenario processing method according to an embodiment of the present application;

fig. 10 is a flow chart of another scenario processing method according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a chip according to an embodiment of the present application.

Detailed Description

The technical solutions in the present application will be described below with reference to the accompanying drawings.

In the description of the embodiments of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B; "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, in the description of the embodiments of the present application, "plurality" means two or more than two.

The terms "first" and "second" are used below for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present embodiment, unless otherwise specified, the meaning of "plurality" is two or more.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

It should be noted that, the scene processing method provided in the embodiment of the present application may be applicable to any electronic device having a function of processing voice.

In some embodiments of the present application, the electronic device may be a mobile phone, a tablet computer, a wearable device, a television, a vehicle-mounted device, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA), or the like, or may be other devices or apparatuses capable of performing scene recognition, and the embodiments of the present application are not limited in any way with respect to the specific type of electronic device.

For a better understanding of the scene processing method provided in the embodiments of the present application, some terms related in the embodiments of the present application are first explained below to facilitate understanding by those skilled in the art.

1. Instant messaging (Instant Messaging IM)

Instant messaging is the most popular communication mode on the Internet (Internet) and realizes online chat and communication through an instant messaging technology.

2. Instant messaging (Instant Messaging, IM) application

An application of on-line chat and communication is realized by instant communication technology.

In an embodiment of the present application, the IM application may include And the like, and is not limited thereto.

3. System-on-Chip (SoC)

Also referred to as a system-on-chip, it means a product that is an integrated circuit with dedicated objects, containing the complete system and having the entire contents of embedded software. At the same time, it is a technology to realize the whole process from determining the system function to dividing the software/hardware and completing the design.

4. Touch Event (Touch Event)

For providing reliable support to a touch-based user interface in response to manipulation of the screen by a user's finger. The operation of the user can be monitored through the touch event, so that the system of the electronic device can respond to the operation of the user conveniently.

A touch event may be triggered when a user's finger is placed on top of the screen, or slid on the screen, or removed from the screen.

5. Touch screen (Touch Panel, TP)

Also called as touch panel, is an inductive liquid crystal display device capable of receiving input signals such as contacts. When the graphic buttons on the screen are contacted, the haptic feedback system on the screen can drive various connecting devices according to the preprogrammed program, can be used for replacing the mechanical button panel and can produce vivid video and audio effects by virtue of the liquid crystal display picture.

6. Touch Down event

A touch down event is the initiation of a touch action.

In the embodiment of the application, touch of the screen by the user's finger generates a Touch Down event.

7. Touch Up event

A touch lift event is the end of a touch action.

In the embodiment of the application, the user lifts the finger on the Touch screen to generate a Touch Up event.

8. Activity (Activity)

Activity is one of the four components of Android (Android). The visual interface is a visual interface operated by a user, and provides a window for completing operation instructions for the user. In Android apps, the most frequently used component in development is Activity, as long as the interface that can be seen is almost dependent on Activity.

9. Layout of a computer system

The layout may define interface structures (e.g., activity's interface structures) in the application.

All elements in the layout are built using a hierarchy of visualization components (View) and View-containing container (ViewGroup) objects. View typically draws content that a user can View and interact with.

The foregoing is a simplified description of the terminology involved in the embodiments of the present application, and is not described in detail below.

With the rapid development of electronic technology, the popularity of electronic devices such as mobile phones, tablet computers, wearable devices and the like is higher and higher, and with the development of network technology, the application functions of the electronic devices are also more and more abundant.

For example, instant messaging (Instant Messaging, IM) applications are installed on current electronic devices, and voice chat functionality is supported in most IM applications. The voice chat function is popular with users because of simple operation and rapid information transfer.

The voice chat function is generally operated by long-pressing the "hold talk" button to input/record voice, releasing the "hold talk" button to end recording after the voice input/recording is completed, and simultaneously the IM application transmits the input/recorded voice to the receiving party.

The operation method of the voice chat function is described below with reference to the accompanying drawings.

Referring to fig. 1 and 2, fig. 1 is a schematic diagram of a voice chat operation according to an exemplary embodiment of the present application, and fig. 2 is a schematic diagram of another voice chat operation according to an exemplary embodiment of the present application.

In the embodiment of the application, the electronic equipment is taken as a mobile phone and the IM application is taken asDescription is made for example. As shown in FIG. 1, the expression (a) in FIG. 1 is +.>Chat main interface, the->A plurality of chat objects are displayed in the chat main interface. Such as the user is +.>Clicking on chat object 101 in the chat main interface, the handset display interface is selected from +. >The chat home interface jumps to the personal chat interface shown in fig. 1 (b). A chat record of the user with the chat object 101 is displayed in the personal chat interface. At this time, the voice input icon 102 and the text input box 103 are displayed in the area below the personal chat interface.

For example, the user clicks the voice input icon 102 shown in fig. 1 (b), and the mobile phone display interface is displayed from the personal chat interface shown in fig. 1 (b) to the voice input interface shown in fig. 1 (c). Specifically, in the area below the voice input interface, the voice input icon 102 shown in fig. 1 (b) is changed to a keyboard input icon 104, the text input box 103 is changed to a voice input box 105, and the voice input box 105 further includes a "push-to-talk" button 1051. Long presses of the "press talk" key 1051 may enable voice input/recording.

In one possible implementation, as shown in FIG. 2, a user, for example, presses the "hold talk" key 1051 in the voice input interface shown in FIG. 2 (a) for a long time. The mobile phone display interface is displayed from the voice input interface shown in fig. 2 (a) to the voice chat interface shown in fig. 2 (b). At this time, the voice input area 106 is displayed in the area under the voice chat interface shown in fig. 2 (b).

In the voice input area 106, a drawn/synthesized voice action 1061, a "cancel transmission" button 1062, a "voice text conversion" button 1063, a voice recording area 1064, and a prompt "release transmission" are displayed.

The user can always trigger the voice input/recording function by pressing any position in the voice recording area 1064, and the voice dynamic effect 1061 is displayed dynamically. When the user releases the finger touching the voice recording area 1064, the voice recording is ended whileThe input/recorded voice is transmitted to the chat object 101. Such as the voice input interface shown in figure 2 (c),the input/recorded 5 second voice is successfully transmitted to the chat object 101. At this time, the area below the voice input interface shown in fig. 2 (c) is again displayed as a keyboard input icon 104 and a voice input box 105, and the voice input box 105 includes a "press and talk" key 1051. The user can again press the "hold talk" key 1051 long to effect voice input/recording.

In another possible implementation, when the user long presses the voice recording area 1064, if the hand points to the "cancel send" button 1062 to slide, the voice recording is ended, while The input/recorded voice is not transmitted to the chat object 101.

For ease of understanding, referring to fig. 3, fig. 3 is a schematic diagram of still another voice chat operation according to an exemplary embodiment of the present application.

As shown in fig. 3, when the user presses the voice recording area 1064 for a long time, the user slides the "cancel transmission" button 1062, and the drawn/synthesized voice action 107 and the prompt "cancel release" are displayed above the "cancel transmission" button 1062. When the user slides the finger towardWhen the "cancel send" button 1062 is released and the finger touching the "cancel send" button 1062 is released, the recording of voice is ended, the voice dynamic effect 107 disappears, and at the same timeThe input/recorded voice is not transmitted to the chat object 101.

Note that the voice dynamic effect 1061 shown in fig. 2 (b) is different from the voice dynamic effect 107 shown in fig. 3. In general, the motion effect area corresponding to the voice motion effect 1061 shown in fig. 2 (b) is larger than the motion effect area corresponding to the voice motion effect 107 shown in fig. 3.

In yet another possible implementation, when the user presses the voice recording area 1064 for a long time, if the hand points to the "voice transfer" button 1063 to slide, the voice recording is ended, and at the same timeThe input/recorded speech is converted into text.

For ease of understanding, referring to fig. 4, fig. 4 is a schematic diagram of still another voice chat operation according to an exemplary embodiment of the present application.

As shown in fig. 4 (a), when the user presses the voice recording area 1064 for a long time, the user slides his/her finger on the "voice-to-text" button 1063, and the text movement 108 that is drawn and synthesized is displayed in the area where the voice movement 1061 is originally displayed. When the user slides the finger to the "voice to text" button 1063, and releases the finger touching the "voice to text" button 1063, the input/recorded speech is converted to text and displayed in the text action 108.

As shown in fig. 4 (b), the text effect 108 displays the text after the input/recorded voice conversion. Meanwhile, a "ok" button 109, a "cancel" button, and a "send original voice" button are displayed in the area below the text movement effect 108.

It should be appreciated that if the user clicks the "ok" button 109, thenThe converted text is sent to the chat object 101. If the user clicks the "cancel" button, then +.>Cancelling transmission, i.e.)>The converted text is not sent to the chat object 101. If the user clicks the "send original speech" button, then +.>Or will send the input/recorded voice to chat object 101.

As can be seen from the above-mentioned application scenarios, during the voice chat, the voice input/recording process draws/synthesizes the dynamic effects, which may include the voice dynamic effect 1061 shown in fig. 2 (b), the voice dynamic effect 107 shown in fig. 3, and the text dynamic effect 108 shown in fig. 4 (b), which are sources of most of the power consumption during the whole voice chat. That is, during the voice chat, the power consumption of the system of the electronic device due to the recording is very small, and most of the power consumption sources are dynamic drawing/synthesis during voice input/recording.

This is essential because most of the screens of electronic devices (e.g. mobile phones) currently used by us support refresh rate adjustment of multiple gears (e.g. current gears have 30Hz, 60Hz, 90Hz, 120Hz, 144Hz, etc.), i.e. the screens of electronic devices (e.g. mobile phones) can be made to adjust the refresh rate of different gears according to different scenes. In order to improve the smoothness of electronic devices (such as mobile phones), the user experience is improved, and Touch Down events of TP are usually bound with frequency boosting of a central processing unit (Central Processing Unit, CPU) and screen refresh rate improvement. For example, inWhen the "press and talk" key is pressed for voice input/recording for a long time, the CPU frequency is generally increased from 1.5GHz to 2.1GHz, and/or the screen refresh rate is increased from 60Hz to 120Hz.

It can be appreciated that in some application scenarios (e.g., voice input scenarios), when a Touch Down event is detected on a screen of an electronic device (e.g., a mobile phone), operations such as CPU frequency boosting, screen refresh rate boosting, etc. are performed, which may exacerbate power consumption of SoC and screen of the electronic device (e.g., the mobile phone).

However, in some application scenarios (such as voice input scenarios), since the dynamic effects of drawing/synthesizing at the time of voice input/recording are simple, the demands on resources such as CPU, soC and the like are not great, and the user does not pay attention to the dynamic effect frame rate, fluency and the like of the screen display content, even if the screen refresh rate is reduced to 30Hz or kept at 60Hz in such application scenarios, the dynamic effects of drawing/synthesizing are not affected. This results in wasted power consumption for the CPU frequency boosting and/or screen refresh rate boosting in such an application scenario. Therefore, it is highly necessary to reduce power consumption in these application scenarios (e.g., voice input scenarios).

In the related art, a voice input scene is generally recognized first, and then power consumption in the voice input scene is reduced. How accurately a speech input scene is recognized is then very important. However, in the related art, it is generally determined whether the current scene is a voice input scene through layout information or Activity information.

For the implementation mode of judging whether the current scene is a voice input scene through layout information, on one hand, voice input interfaces of different IM applications may be different, and when the voice input interfaces are different, whether the current scene is the voice input scene cannot be judged in time through unified layout information, so that the recognition of the voice input scene is inaccurate. On the other hand, for new IM application, the model test is performed first, then the corresponding layout information is configured, the workload of the whole process is large, the information is more, the maintenance is inconvenient, errors are easy to occur, and finally, the recognition of the voice input scene is inaccurate.

For the implementation mode of judging whether the current scene is a voice input scene through the Activity information, the error is easy to occur once the Activity information is more than once because various different Activity information needs to be recorded, and the error recognition result is caused. For example, a voice input scene is not recognized, or a scene that is not a voice input scene is recognized as a voice input scene.

It can be seen that, whether the voice input scene is through layout information or Activity information, the voice input scene cannot be accurately identified, and therefore the power consumption in the voice input scene cannot be accurately reduced.

In view of this, the embodiment of the present application provides a scene processing method, where when a touch start event is detected, a recording start event is detected within a preset time interval, and a first application and a second application are detected to be consistent, the current scene is determined to be a voice input scene; in a voice input scene, reducing the power consumption parameter of the electronic equipment, or improving the power consumption parameter to a first value, wherein the first value is smaller than a second value, and the second value is the power consumption parameter improved by the electronic equipment when the touch starting event is detected.

In the related art, it is difficult to adapt the voice input scenes to the IM applications one by one through the layout information or the Activity information, and many IM applications will not develop corresponding activities for the voice input scenes, so that the voice input scenes of all IM applications cannot be covered through the layout information or the Activity information, and the voice input scenes of all IM applications cannot be accurately identified through the layout information or the Activity information. The scene processing method provided by the embodiment of the application can be covered on all IM applications, so that the voice input scenes of all IM applications can be accurately identified, and the applicability is wider.

The following is a brief description of the hardware structure of the electronic device according to the embodiments of the present application with reference to the accompanying drawings.

Referring to fig. 5, fig. 5 is a schematic diagram illustrating a hardware structure of an electronic device according to an exemplary embodiment of the present application.

As shown in fig. 5, the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, a user identification module (subscriber identification module, SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than those shown in FIG. 5, or electronic device 100 may include a combination of some of the components shown in FIG. 5, or electronic device 100 may include sub-components of some of the components shown in FIG. 5. The components shown in fig. 5 may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In an embodiment of the present application, the processor 110 may perform determining that the current scene is a voice input scene when a touch start event is detected, a recording start event is detected within a preset time interval, and the first application and the second application are detected to be consistent; and in the voice input scene, reducing the power consumption parameter of the electronic equipment, or improving the power consumption parameter to a first value. For example, the processor 110 may execute software codes of the scene processing method provided in the embodiment of the present application, so as to identify a voice input scene and reduce power consumption of an electronic device in the voice input scene.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor (mobile industry processor interface, MIPI) interface, a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

It should be understood that the connection relationship between the modules illustrated in this embodiment is only illustrative, and does not limit the structure of the electronic device 100. In other embodiments, the electronic device 100 may also employ different interfaces in the above embodiments, or a combination of interfaces.

The charge management module 140 is configured to receive a charge input from a charger. The charging management module 140 may also supply power to the electronic device 100 through the power management module 141 while charging the battery 142.

The power management module 141 is used for connecting the battery 142, the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.

The connection relationship between the modules shown in fig. 5 is merely illustrative, and does not limit the connection relationship between the modules of the electronic device 100. Alternatively, the modules of the electronic device 100 may also use a combination of the various connection manners in the foregoing embodiments.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, the baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/2G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with a network and other devices through wireless communication techniques. Wireless communication techniques may include global system for mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a beidou satellite navigation system (beidou navigation satellite system, BDS), a quasi zenith satellite system (quasi-zenith satellite system, QZSS) and/or a satellite based augmentation system (satellite based augmentation systems, SBAS). It is understood that in embodiments of the present application, a hardware module in a positioning or navigation system may be referred to as a positioning sensor.

The electronic device 100 may implement display functions through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. GPUs can also be used to perform mathematical and pose calculations, for graphics rendering, and the like. Processor 110 may include one or more GPUs, the execution of which may generate or change display information.

The display 194 may be used to display images or video and may also display a series of graphical user interfaces (graphical user interface, GUIs), all of which are home screens for the electronic device 100. Generally, the size of the display 194 of the electronic device 100 is fixed and only limited controls can be displayed in the display 194 of the electronic device 100. A control is a GUI element that is a software component contained within an application program that controls all data processed by the application program and interactive operations on that data, and a user can interact with the control by direct manipulation (direct manipulation) to read or edit information about the application program. In general, controls may include visual interface elements such as icons, buttons, menus, tabs, text boxes, dialog boxes, status bars, navigation bars, widgets (widgets), and the like.

In embodiments of the present application, the display 194 may be used to display various interfaces involved in the voice chat process.

The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N may be a positive integer greater than 1.

The display screen 194 in the embodiments of the present application may be a touch screen. The display 194 may have the touch sensor 180K integrated therein. The touch sensor 180K may also be referred to as a "touch panel". That is, the display screen 194 may include a display panel and a touch panel, and a touch screen, also referred to as a "touch screen", is composed of the touch sensor 180K and the display screen 194. The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. After a touch operation detected by the touch sensor 180K, a driving (e.g., TP driving) of the kernel layer may be transferred to an upper layer to determine a touch event type. Visual output related to the touch operation may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a different location than the display 194.

In the embodiment of the present application, the touch sensor 180K detects a touch operation by the user. For example, when the user touches the display 194, the Touch sensor 180K detects a Touch operation of the user, and the driving (e.g., TP driving) of the kernel layer is transferred to the upper layer to determine the Touch event type, such as a Touch Down event. For another example, when the user no longer touches the display 194, the Touch sensor 180K detects that the user lifts a finger, which is passed to the upper layer by a kernel layer driver (e.g., TP driver) to determine the type of Touch event, such as a Touch Up event. The processor 110 provides visual output related to the touch operation through the display screen 194 in response to the touch operation of the user, such as the display screen 194 displaying a personal chat interface, a voice input interface, a voice chat interface, various dynamic effects of drawing/composition, and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer-executable program code that includes instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an operating system, an APP (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on.

In addition, the internal memory 121 may include a high-speed random access memory; the internal memory 121 may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash memory (Universal Flash Storage, UFS), etc.

The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A is of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. The capacitance between the electrodes changes when a force is applied to the pressure sensor 180A. The electronic device 100 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic apparatus 100 detects the touch operation intensity according to the pressure sensor 180A. The electronic device 100 may also calculate the location of the touch based on the detection signal of the pressure sensor 180A. In some embodiments, different touch positions are acted on, but different touch durations may correspond to different operation instructions.

The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically, x-axis, y-axis, and z-axis). The magnitude and direction of gravity may be detected when the electronic device 100 is stationary. The acceleration sensor 180E may also be used to recognize the gesture of the electronic device 100 as an input parameter for applications such as landscape switching and pedometer.

The ambient light sensor 180L is used to sense ambient light level. The electronic device 100 may adaptively adjust the brightness of the display 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust white balance when taking a photograph. Ambient light sensor 180L may also cooperate with proximity light sensor 180G to detect whether electronic device 100 is in a pocket to prevent false touches.

The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 may utilize the collected fingerprint feature to perform functions such as unlocking, accessing an application lock, taking a photograph, and receiving an incoming call.

The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The electronic device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback.

The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc. The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195, or removed from the SIM card interface 195 to enable contact and separation with the electronic device 100. The electronic device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support Nano SIM cards, micro SIM cards, and the like.

In addition, above the above components, various types of operating systems are running. Such as Android (Android) systems, IOS operating systems, sambac (Symbian) operating systems, blackberry (Black Berry) operating systems, linux operating systems, windows operating systems, etc. This is merely illustrative and is not limiting. Different applications, such as any application that supports voice chat functionality, may be installed and run on these operating systems.

The scene processing method provided in the embodiment of the present application may be implemented in the electronic device 100 having the above-described hardware structure.

The structure of the electronic device 100 according to the embodiment of the present application is described briefly above, and the software structure according to the embodiment of the present application is described briefly below. Referring to fig. 6, fig. 6 is a software architecture block diagram of an electronic device according to an exemplary embodiment of the present application. The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is described as an example of the electronic device 100, and is divided into four layers, namely, an application layer, an application framework layer, an Zhuoyun row (Android run) and system library, and a kernel layer from top to bottom.

The application layer may include a series of application packages. As shown in fig. 6, the application packages may include cameras, calendars, maps, wireless local area networks (wireless local area networks, WLAN), music, short messages, various IM applications (e.g.Etc.), a scene recognition service, etc.

The IM application is an application for realizing online chat and communication through an instant messaging technology.

A scene recognition service is an application resident at the application layer and is generally not visible to the user.

The scene recognition service is used to recognize a voice input scene and determine a method of reducing power consumption. The method for reducing the power consumption may be to reduce the power consumption parameter of the electronic device, or to increase the power consumption parameter to a first value, where the first value is smaller than a second value, and the second value is the power consumption parameter to which the electronic device is increased when detecting the touch initiation event.

Alternatively, in one possible implementation, the scene recognition service may also be an integrated unit, module, chip, etc. for recognizing a voice input scene and determining a method of reducing power consumption.

The application framework layer provides an application programming interface (Application Programming Interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions.

As one example of the present application, the application framework layer may include an Input dispatch thread (Input Dispatcher) module, a recording service (Audio Record) module, a window manager (WindowManager), and a adjust power consumption parameter module.

In the embodiment of the present application, the Input Dispatcher module may process all events related to TP.

The Input Dispatcher module is added with a trigger function in advance, and when a Touch Down event is detected, the Input Dispatcher module is triggered to transmit the Touch Down event to the IM application, and the IM application receives the Touch Down event. Meanwhile, the Input Dispatcher module also transmits the Touch Down event to the scene recognition service, and the scene recognition service receives the Touch Down event and records the identification information, event type, touch start/start time, touch position and other information of the Touch Down event.

When a Touch Up event is detected, an Input Dispatcher module is triggered to transmit the Touch Up event to an IM application, and the IM application receives the Touch Up event. Meanwhile, the Input Dispatcher module also transmits the Touch Up event to the scene recognition service, and the scene recognition service receives the Touch Up event and records the identification information, event type, touch end time, finger leaving position and other information of the Touch Up event.

In the embodiment of the application, the Audio Record module can process all events related to Audio.

The Audio Record module may include an Audio record.start interface and an Audio record.stop interface.

The IM application calls an Audio Record/start interface in the system, and then the Audio Record module starts voice recording, and the Audio Record module generates a recording start event.

The triggering function is added in advance in the Audio Record module, and after the IM application calls the Audio Record/start interface, the Audio Record module is triggered to inform the scene recognition service, and the scene recognition service records information of a recording start event, such as recording start time, a second UID of a second application calling the Audio Record/start interface, a second application package name and the like.

After the IM application invokes the Audio Record/stop interface, the Audio Record module is triggered to notify the scene recognition service, where the scene recognition service records Audio Record stop event information, such as recording end time, recording duration, a second UID of a second application invoking the Audio Record/stop interface, and a second application package name.

The window manager (WindowManager) is configured to send the UID to which the window of the focused application belongs and the application package name to the scene recognition service.

The power consumption parameter adjusting module is used for adjusting the power consumption parameters of the electronic equipment. For example, the power consumption parameter of the electronic device is reduced or the power consumption parameter is increased to a first value.

It should be noted that, the scene recognition service only determines the method for reducing power consumption, and the method for reducing power consumption is implemented by adjusting the power consumption parameter module.

Alternatively, in one possible implementation, the adjust power consumption parameter module may include a system services (surfeflinger) module.

The SurfaceFlinger is a local process of Android and is responsible for synthesizing the layers, and the layers are stacked to form an interface seen by people. In embodiments of the present application, a surfeflinger module may be used to adjust the screen refresh rate.

Optionally, the application framework layer may also include a content provider, a telephony manager, a resource manager, a notification manager, and the like.

The content provider is used to store and retrieve data, which may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc., and make such data accessible to the application.

The view system may include visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to construct a display interface for an application, which may be comprised of one or more views, such as a view that includes displaying a text notification icon, a view that includes displaying text, and a view that includes displaying a picture.

The telephony manager is used to provide communication functions of the electronic device 200, such as management of call status (including on, off, etc.).

The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.

The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. For example, a notification manager is used to inform that the download is complete, a message alert, etc. The notification manager may also be a notification that appears in the system top status bar in the form of a chart or a scroll bar text, such as a notification of a background running application. The notification manager may also be a notification that appears on the screen in the form of a dialog window, such as a text message being prompted in a status bar, a notification sound being emitted, the electronic device vibrating, a flashing indicator light, etc.

Android run time includes a core library and virtual machines. Android run time is responsible for scheduling and management of the Android system. The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android. The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules, such as: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), 2D graphics engines (e.g., SGL), etc.

The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.

Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like. The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The kernel layer may include a touch panel driver (TouchPanel Driver) for collecting touch initiation events generated after a user (e.g., a user's finger or a touch object such as a stylus) touches a touch panel of the electronic device. And then, the touch panel driver uploads the acquired touch initiation event to the Input Dispatcher.

The kernel layer may also include display drivers that may be used to display different display windows, such as IM application windows, etc.

The kernel layer may also include camera drivers, audio drivers, sensor drivers, etc.

The software structure related to the embodiment of the present application is briefly introduced above, and in the embodiments of the present application, an electronic device having the structure shown in fig. 5 and fig. 6 is taken as an example, and the scene processing method provided in the embodiment of the present application is specifically described with reference to the accompanying drawings and application scenes.

Referring to fig. 7, fig. 7 is a flow chart of a scene processing method according to an embodiment of the present application. The method comprises the following steps:

s101, when a touch starting event is detected, a recording starting event is detected within a preset time interval, and the first application and the second application are detected to be consistent, determining that the current scene is a voice input scene.

The first application is an application corresponding to the touch initiation event. The first application may be understood as an application that receives a long press operation and a click/touch operation by a user currently, or may be understood as an application to which a region/position where the long press operation and the click/touch operation are currently performed by the user. Alternatively, it is also understood that the first application is the application that generated the touch initiation event.

It should be appreciated that typically the first application and the second application are embodied as IM applications supporting voice chat functionality.

In one example, when a touch start event is detected and a recording start event is not detected, the current scene is determined to be a non-voice input scene.

In another example, when a touch start event is detected and a recording start event is not detected within a preset time interval, the current scene is determined to be a non-voice input scene.

In yet another example, when the recording start event is detected and the recording start event is detected within a preset time interval, but the first application and the second application are detected to be inconsistent, the current scene is determined to be a non-voice input scene.

In another example, when the recording start event is detected first and then the touch start event is detected, the current scene is determined to be a non-voice input scene even though the first application and the second application are consistent.

In yet another example, when a touch start event is detected, a recording start event is detected within a preset time interval, and a first application corresponding to the touch start event and a second application corresponding to the recording start event are detected to be consistent, the current scene is determined to be a voice input scene.

In the implementation manner, when a touch starting event is detected, a recording starting event is detected within a preset time interval, and the first application and the second application are detected to be consistent, determining that the current scene is a voice input scene; in a voice input scene, reducing the power consumption parameter of the electronic equipment, or improving the power consumption parameter to a first value, wherein the first value is smaller than a second value, and the second value is the power consumption parameter improved by the electronic equipment when the touch starting event is detected.

S102, under a voice input scene, reducing the power consumption parameter of the electronic equipment, or improving the power consumption parameter to a first value.

The first value is less than a second value, the second value being a power consumption parameter to which the electronic device is increased upon detection of a touch initiation event.

The power consumption parameters may include any one or any combination of CPU frequency, screen refresh Rate, screen brightness, double Data Rate (DDR) frequency, touch Panel (TP) point count Rate, application drawing frame Rate, and system composition frame Rate.

Where DDR frequency refers to the frequency of double data transmission. The TP reporting rate refers to the rate at which the TP reports the touch screen position to the CPU. The application rendering frame rate refers to the frequency at which IM application renders bitmap images in frames to appear continuously on the display screen.

The content displayed on the screen of the electronic equipment is that the information required to be displayed in all processes in the current system is synthesized into one frame through the SurfaceFlinger of the Android system and then submitted to the screen for display, and the system synthesis frame rate refers to the number of frames submitted to the screen by the SurfaceFlinger within one second.

For example, in the related art, the electronic device may raise the current power consumption parameter of the electronic device to the second value when the touch initiation event is detected. Optionally, in one possible implementation manner, the scene processing method provided in the embodiment of the present application reduces a power consumption parameter of the electronic device when determining that the current scene is a voice input scene.

When the power consumption parameter includes any one of a CPU frequency, a screen refresh rate, a screen brightness, a DDR frequency, a TP point count rate, an application drawing frame rate, and a system composition frame rate, the any one of the power consumption parameters is reduced.

When the power consumption parameters include any combination of CPU frequency, screen refresh rate, screen brightness, DDR frequency, TP-header rate, application-drawing frame rate, and system-composition frame rate, each power consumption parameter in the any combination is reduced.

In the implementation mode, compared with the situation that the power consumption parameter is improved when the touch initial event is detected in the related technology, the scene processing method reduces the power consumption parameter of the electronic equipment in the voice input scene, effectively reduces the power consumption of the electronic equipment in the voice input scene, reduces the loss of each device of the electronic equipment while not affecting the user experience, and improves the endurance and service life of the electronic equipment.

Optionally, in another possible implementation manner, in the scene processing method provided in the embodiment of the present application, when it is determined that the current scene is a voice input scene, the power consumption parameter is increased to a first value, where the first value is smaller than the second value.

When the power consumption parameter includes any one of a CPU frequency, a screen refresh rate, a screen brightness, a DDR frequency, a TP-header rate, an application-drawing frame rate, and a system-composition frame rate, the first value includes a value. And raising any power consumption parameter to a first value.

When the power consumption parameters include any combination of CPU frequency, screen refresh rate, screen brightness, DDR frequency, TP-header rate, application-drawing frame rate, and system-composition frame rate, the first value includes a value corresponding to each of the power consumption parameters in the any combination. Each power consumption parameter in any combination is raised to its respective corresponding value.

In this implementation manner, compared with the case that the power consumption parameter is raised when the touch start event is detected in the related art, in the scene processing method provided by the application, under the voice input scene, although the power consumption parameter of the electronic device is raised, the raising amplitude is small, that is, the first value corresponding to the raised power consumption parameter is not larger than the second value corresponding to the raised power consumption parameter in the related art. Therefore, compared with the related art, the scene processing method provided by the application has the advantages that the power consumption of the electronic equipment is effectively reduced under the voice input scene, the loss of each device of the electronic equipment is reduced while the user experience is not influenced, and the endurance and the service life of the electronic equipment are improved.

In one possible implementation, in the related art, the electronic device adjusts a power consumption parameter of the electronic device within a first numerical range when the electronic device detects a touch initiation event. It should be noted that, in the related art, the power consumption parameter after adjustment is larger than the power consumption parameter before adjustment.

According to the scene processing method, under the voice input scene, the range of the power consumption parameter of the electronic equipment is reduced to the second numerical range, and the maximum value of the second numerical range is smaller than the maximum value of the first numerical range. It should be noted that, in the scene processing method provided in the present application, the adjusted power consumption parameter may be greater than the power consumption parameter before adjustment, may also be equal to the power consumption parameter before adjustment, and may also be less than the power consumption parameter before adjustment, but no matter how the adjustment is performed, the adjusted power consumption parameter is less than the power consumption parameter after adjustment in the related art.

Taking the power consumption parameter as the CPU frequency as an example for explanation. For example, the first range of values may be 1.8GHz to 2.1GHz, the second range of values may be 1.5GHz to 1.7GHz, 1.5GHz to 1.8GHz, 1.5GHz to 1.9GHz, 1.8GHz to 2.0GHz, 1.5GHz to 2.0GHz, and so on. This is merely illustrative and is not limiting.

In the implementation mode, compared with the method for improving the power consumption parameter when the touch initial event is detected in the related technology, the scene processing method provided by the application effectively reduces the power consumption of the electronic equipment in a voice input scene, reduces the loss of each device of the electronic equipment while not affecting the user experience, and improves the endurance and service life of the electronic equipment.

The scene processing method provided by the application is summarized above, and the scene processing method provided by the application is described in detail below in a complete flow. Referring to fig. 8, fig. 8 is a flow chart of another scene processing method according to an embodiment of the present application. The method comprises the following steps:

s201, detecting a touch start event.

In the embodiment of the present application, the Touch initiation event is also called Touch Down event, and the Touch Down event is taken as an example for illustration.

The screen/touch screen of an electronic device (such as a mobile phone) consists of a touch sensor and a display screen. Wherein the touch sensor is used for detecting a touch operation/touch operation acting on or near it.

Illustratively, when a user touches/clicks on a screen of an electronic device (e.g., a cell phone), the Touch sensor detects the user's Touch/click operation and a Touch Down event is generated accordingly. It should be noted that, each time a user touches/clicks a screen of an electronic device (such as a mobile phone), the Touch sensor detects the Touch/click operation of the user, and generates a corresponding Touch Down event, and different identification information is assigned to each generated Touch Down event to distinguish different Touch Down events.

The Touch Down event may carry information such as event type, touch start/start time, touch position, etc. Event types may include click types, long press types, and so on.

In the embodiment of the application, IM application is taken asDescription is made for example. For example, the user opens an IM application and enters a personal chat interface, such as the personal chat interface shown in fig. 1 (b). The user clicks the voice input icon 102 in the personal chat interface, and a corresponding Touch Down event is generated based on the clicking operation, and the mobile phone display interface is displayed from the personal chat interface shown in fig. 1 (b) to the voice input interface shown in fig. 1 (c). In the area below the voice input interface, the voice input icon 102 shown in fig. 1 (b) is changed to a keyboard input icon 104, the text input box 103 is changed to a voice input box 105, and the voice input box 105 further includes a "press and talk" button 1051. The user long presses the "hold talk" key 1051, which in turn generates a corresponding Touch Down event.

Each generated Touch Down event is passed to the TP, which receives the Touch Down event.

It should be noted that, in order to improve the fluency of the electronic device and bring better experience to the user, the power consumption parameter of the electronic device is usually improved after the Touch Down event is detected, but in the scene processing method provided by the application, when the current scene is determined to be the voice input scene, the power consumption of the electronic device is reduced. Therefore, even compared with the related art, the power consumption of the electronic equipment in the voice input scene is effectively reduced, the loss of each device of the electronic equipment is reduced while the user experience is not influenced, and the endurance and the service life of the electronic equipment are improved.

S202, transmitting a touch starting event.

After receiving the Touch Down event, TP transmits the Touch Down event to the Input Dispatcher module. Because the Input Dispatcher module is added with the trigger function in advance, when the Touch Down event is detected, the Input Dispatcher module is triggered to transmit the Touch Down event to the IM application, and the IM application receives the Touch Down event.

Meanwhile, the Input Dispatcher module also transmits the Touch Down event to the scene recognition service, and the scene recognition service receives the Touch Down event and records the identification information, event type, touch start/start time, touch position and other information of the Touch Down event.

It should be noted that, in order to facilitate the subsequent calculation of the time difference, the Touch start/start time of the Touch Down event is referred to as the first time in this embodiment.

Where the scene recognition service is an application resident at the application level and is generally not visible to the user.

Optionally, in one possible implementation, the scene recognition service may also obtain the Touch Down event from other modules, and record the identification information, event type, touch start/start time, touch position, and other information of the Touch Down event. For example, the electronic device may include a Touch screen driver, and the Touch Down event is transferred from the TP driver to the application framework layer and then to the application layer, in other words, the TP driver, the application framework layer, and the application layer may detect the Touch Down event, and then the scene recognition service may obtain the Touch Down event from the TP driver, the application framework layer, or the application layer.

S203, the IM application calls a recording service module.

An Audio Record service (Audio Record) module is a module in an application framework layer in an electronic device (such as a mobile phone) system, and the Audio Record module may include an Audio Record.

Illustratively, the IM application invokes an Audio Record/start interface in the system, after which the Audio Record module begins recording speech and the Audio Record module generates a recording start event.

Because the trigger function is added in advance in the Audio Record module, after the IM application calls the Audio record.start interface, the Audio Record module is triggered to inform the scene recognition service, and the scene recognition service records the information of the recording start event, namely the recording start time, the second UID of the second application calling the Audio record.start interface, the second application package name and the like.

Note that, in order to facilitate the subsequent calculation of the time difference, in this embodiment, the recording start time of the recording start event is recorded as the second time.

Alternatively, in one possible implementation, the scene recognition service may detect the Audio Record event from other means. For example, the electronic device may include a microphone driver, and the scene recognition service may also detect Audio Record events through the microphone driver. Specifically, when the output voltage of the microphone driver is detected to be within the voltage range corresponding to the recorded voice, it can be determined that the current electronic device (such as a mobile phone) is recording the voice by using the microphone, so as to determine that the Audio Record event exists currently. And acquiring the recording start time of the Audio Record event and recording as a second time.

For another example, the scene recognition service may also detect an Audio Record event through the state of the Audio In lock. Because the Audio In lock is In the recording state currently In the standby state, when the Audio In lock is detected to be In the standby state, the Audio Record event is judged to exist currently. And acquiring the recording start time of the Audio Record event and recording as a second time.

S204, judging whether the calling party of the recording service module is a focus application.

The focus application is the foreground application. In this embodiment of the present application, the focus application refers to an application to which a user currently performs a long press operation and a click/touch operation, and may be understood as an application to which a region/position to which the user currently performs the long press operation and the click/touch operation belongs.

The scene recognition service judges whether the calling party of the Audio Record module is a focus application, namely the scene recognition service judges whether the application calling the Audio Record. Start interface is a focus application. Or, the scene recognition service determines whether the first application currently generating the Touch Down event is consistent with the second application calling the Audio record.

In one example, the user isLong-term pressing of the "press talk" key for speech input/recording generates a Touch Down event corresponding to an application +. >Due to->The Audio recording is required to call an Audio Record/start interface in an Audio Record module, and the calling party of the Audio Record module isIn the application scene, a first application generating a Touch Down event is consistent with a second application calling an Audio record.start interface, that is, a calling party of the Audio Record module is an application which is currently operated by a user in a long press mode. This corresponds to the case of a speech input scenario, but only proves that the current scenario may be a speech input scenario, not necessarily a speech input scenario.

In another example, a user, for example, sets a split screen in an electronic device (e.g., a cell phone) while usingWhile using the camera, the user is +.>Mid-click of the "hold talk" key generates a Touch Down event corresponding to an application of +.>And the application of the detection discovery calling Audio Record module is a camera. In this application scenario, the first application that generates the Touch Down event is inconsistent with the second application that invokes the Audio Record module, that is, the caller of the Audio Record module is an application that does not currently perform a clicking operation by the user, and the certificateIt is clear that the current scene is necessarily not a speech input scene.

If the first application generating the Touch Down event is not judged to be consistent with the second application calling the Audio Record module, a situation of misidentification possibly occurs, namely, a scene which is not a voice input scene is misidentified as a voice input scene. For example, a Touch Down event is generated on one split screen application, and the other split screen application invokes an Audio Record module, so that the scene of the latter split screen application is mistakenly identified as a voice input scene. When the error recognition is performed as the voice input scene, the power consumption of at least two current split screen applications of the electronic equipment (such as a mobile phone) can be reduced, and the user experience is seriously affected. Therefore, it is important to accurately identify the voice input scene to determine whether the caller of the Audio Record module is the focus application, i.e. whether the first application that currently generates the Touch Down event is consistent with the second application that invokes the Audio Record.

The specific way for judging whether the first application generating the Touch Down event is consistent with the second application calling the Audio record.

Optionally, in one possible implementation, a first Unique Identifier (UID) of the first application is obtained, and a second UID of a second application calling the Audio record. Judging whether the first UID is consistent with the second UID, if the first UID is consistent with the second UID, judging that the first application which generates the Touch Down event currently is consistent with the second application which calls the Audio Record. Start interface, namely, the current Touch Down event is the Touch Down event generated by the IM application, and the calling party of the Audio Record module is the focus application. If the first UID and the second UID are inconsistent, judging that the first application which generates the Touch Down event currently is inconsistent with the second application which calls the Audio record.start interface, namely that the current Touch Down event is not the Touch Down event generated by the IM application, and the calling party of the Audio Record module is not the focus application.

Optionally, in another possible implementation manner, a first application package name of the first application is obtained, and a second application package name of a second application calling the Audio record. Judging whether the first application package name is consistent with the second application package name, if the first application package name is consistent with the second application package name, judging that the first application which generates the Touch Down event currently is consistent with the second application which calls the Audio Record. Start interface, namely, the current Touch Down event is the Touch Down event generated by the IM application, and the calling party of the Audio Record module is the focus application. If the first application package name is inconsistent with the second application package name, judging that the first application which generates the Touch Down event currently is inconsistent with the second application which calls the Audio Record/start interface, namely that the current Touch Down event is not the Touch Down event generated by the IM application and the calling party of the Audio Record module is not the focus application.

Optionally, in yet another possible implementation manner, a first UID and a first application package name of the first application are obtained, and a second UID and a second application package name of a second application that invokes an Audio record. Judging whether the first UID is consistent with the second UID, judging whether the first application package name is consistent with the second application package name, if the first UID is consistent with the second UID, and the first application package name is consistent with the second application package name, judging that the first application which generates the Touch Down event currently is consistent with the second application which calls the Audio Record. Start interface, namely, the current Touch Down event is the Touch Down event generated by the IM application, and the calling party of the Audio Record module is the focus application. Otherwise, the first application which generates the Touch Down event at present is judged to be inconsistent with the second application which calls the Audio Record. Start interface, namely, the current Touch Down event is not the Touch Down event generated by the IM application, and the calling party of the Audio Record module is not the focus application.

It should be noted that, when the result of executing step S204 is that it is determined that the caller of the recording service module is not the focus application, step S205 is executed. When the calling party of the recording service module is determined to be the focus application as a result of executing step S204, step S206 is executed. That is, step S205 is executed in parallel with step S206, and step S206 is not executed after step S205.

S205, when the caller of the recording service module is judged not to be the focus application, the improvement of the power consumption parameter of the electronic equipment is not limited.

Illustratively, when it is determined that the caller of the Audio Record module is not the focus application, it proves that the current scene is necessarily not a voice input scene.

In one possible implementation, when it is determined that the caller of the recording service module is not the focus application, not limiting the increase of the power consumption parameter of the electronic device means that the current power consumption parameter of the electronic device is maintained and no adjustment is performed.

In another possible implementation, when it is determined that the caller of the recording service module is not the focus application, not limiting the increase in the power consumption parameter of the electronic device refers to increasing the power consumption parameter of the current electronic device. For example, in order to improve the smoothness of electronic equipment (such as a mobile phone), improve the user experience, when a Touch Down event is detected, any one or any combination of multiple power consumption parameters such as CPU frequency, screen refresh rate, screen brightness, DDR frequency, TP reporting rate, application drawing frame rate, system composition frame rate, and the like can be improved.

S206, when judging that the calling party of the recording service module is the focus application, judging the sequence of the touch start event and the recording start event.

When the scene recognition service judges that the calling party of the Audio Record module is the focus application, the scene recognition service continues to judge the sequence of the Touch Down event and the recording start event, or judges the sequence of calling the Audio Record. Start interface and the Touch Down event.

Optionally, in one possible implementation, the Audio record/start interface is called first, and then a Touch Down event occurs, that is, the Audio record/start interface is called first, and the Touch Down event is called later. Colloquially, the voice recording is performed first, and then the long-press operation is performed.

In this case, it proves that the current scene is necessarily not a voice input scene. Some applications (such as navigation applications) used by the user can automatically start the recording, and then the user performs long-press operation, not the recording started by the user operation. In the application scenario, in order to improve the fluency of the electronic device (such as a mobile phone), the user experience is improved, and when a Touch Down event is detected, an operation of improving the power consumption can be performed. Specifically, any one or any combination of a plurality of power consumption parameters such as CPU frequency, screen refresh rate, screen brightness, DDR frequency, TP point reporting rate, application drawing frame rate, system composition frame rate and the like can be improved.

Optionally, in another possible implementation, the Touch Down event occurs first, then the Audio record/start interface is invoked, i.e., the Touch Down event is preceded and the Audio record/start interface is invoked later. Colloquially, the long-press operation is firstly performed, and then the voice recording is performed.

In this case, the sequence of the recording start event and the Touch Down event accords with the situation of the voice input scene, but only the current scene can be proved to be possibly the voice input scene, not necessarily the voice input scene, and further judgment is needed.

It should be noted that, step S206 may be executed first, and then step S204 may be executed, that is, the sequence of the touch start event and the recording start event is determined first, and when the sequence of the touch start event and the recording start event is that the touch start event occurs first and then the recording start event occurs, it is determined whether the caller of the recording service module is the focus application. When it is determined that the caller of the recording service module is the focus application, step S207 is executed.

If the sequence of the touch starting event and the recording starting event is that the recording starting event occurs first and then the touch starting event occurs, the improvement of the power consumption parameters of the electronic equipment is not limited. If the calling party of the recording service module is not the focus application, the improvement of the power consumption parameter of the electronic equipment is not limited.

S207, judging whether the time interval between the touch start event and the recording start event is smaller than a preset time interval.

When the sequence of the Touch Down event and the recording start event is that the Touch Down event is in front, the scene recognition service judges whether the time interval of the Touch Down event and the recording start event is smaller than the preset time interval or not when the recording start event is in back.

The preset time interval can be set and adjusted by a user according to actual conditions. For example, in the embodiment of the present application, the preset time interval may be 200 milliseconds (ms), 300 ms, or the like, which is not limited.

Illustratively, upon detecting a Touch Down event, the scene recognition service records the Touch start/start time of the Touch Down event, i.e., the first time. Upon detecting the recording start event, the scene recognition service records a recording start time of the recording start event, i.e., a second time. And calculating a difference value between the second time and the first time, and comparing the difference value with a preset time interval.

Optionally, in one possible implementation, the difference is greater than or equal to a preset time interval. Colloquially, it is understood that after the Touch Down event is generated, the recording start event occurs only after a long time (a time period greater than or equal to a preset time interval), for example, after the Touch Down event is generated, the recording start event occurs only after half a minute, in which case, it is proved that the current scene is necessarily not a voice input scene.

In the application scenario, in order to improve the fluency of the electronic equipment (such as a mobile phone), the user experience is improved, and when a Touch Down event is detected, an operation of improving the power consumption can be performed. Specifically, any one or any combination of a plurality of power consumption parameters such as CPU frequency, screen refresh rate, screen brightness, DDR frequency, TP point reporting rate, application drawing frame rate, system composition frame rate and the like can be improved.

Alternatively, in another possible implementation, the difference is less than a preset time interval. In popular sense, after the Touch Down event is generated, a recording start event occurs soon (less than the duration of the preset time interval), and in this case, the situation of the voice input scene is met, so that the current scene is proved to be the voice input scene.

That is, when the current scene satisfies three conditions, it is determined that the current scene is a voice input scene. The three conditions are respectively a first application for generating a Touch Down event, and consistent with a second application for calling an Audio record. Start interface, the sequence of the Touch Down event and the recording start event is that the difference between the second time of the recording start event and the first time of the Touch Down event is smaller than a preset time interval after the preceding recording start event of the Touch Down event, and the sequence for judging the three conditions is not limited in the embodiment of the present application.

S208, in a voice input scene, reducing the power consumption parameter of the electronic equipment, or improving the power consumption parameter to a first value.

Illustratively, in this voice input scenario, the scenario recognition service determines a method of reducing power consumption. The scene recognition service sends the method for reducing the power consumption to the power consumption parameter adjustment module, and the power consumption parameter adjustment module specifically executes the method.

The method for reducing the power consumption may be to reduce the power consumption parameter of the electronic device, or to increase the power consumption parameter to a first value, where the first value is smaller than a second value, and the second value is the power consumption parameter to which the electronic device is increased when detecting the touch initiation event.

It should be noted that the power consumption reduction in the embodiments of the present application is relative to the related art.

The CPU frequency reduction will be described as an example. For example, in the related art, the CPU frequency is changed in one interval under a conventional scenario (a scenario in which a Touch Down event is not detected), and the CPU frequency is greatly increased after the Touch Down event is detected. In the related art, the CPU frequency is greatly improved on the original frequency, so in the embodiment of the present application, the CPU frequency may be reduced by maintaining the original frequency of the CPU, or by reducing the frequency on the basis of the original frequency of the CPU, or by slightly improving the frequency on the basis of the original frequency of the CPU.

For example, in the related art, the CPU frequency varies between 0.9GHz and 1.5GHz in a conventional scenario, and the CPU frequency is raised to 2.1GHz after a Touch Down event is detected.

In this embodiment of the present application, after recognizing the voice input scene, the original frequency of the CPU may be maintained, and the original frequency of the CPU is not lifted, for example, the frequency of the CPU is maintained at 1.5GHz. Alternatively, the CPU frequency is increased by a first predetermined amount (e.g., 10%, 20%, 30%, etc.) based on the original CPU frequency, such as from 1.5GHz to 1.7GHz. Alternatively, the frequency is reduced based on the original frequency of the CPU, such as reducing the CPU frequency from 1.5GHz to 1.3GHz. This is merely illustrative and is not limiting.

Similarly, the reduced screen refresh rate in the embodiments of the present application is also relative to the related art. For example, in the related art, the screen refresh rate supports refresh rate adjustment of multiple gears (e.g., current gears have 30Hz, 60Hz, 90Hz, 120Hz, 144Hz, etc.), the screen refresh rate is maintained at a low gear (e.g., 60 Hz) in a normal scene (a scene in which a Touch Down event is not detected), and the screen refresh rate is greatly increased after the Touch Down event is detected. In the related art, the screen refresh rate is greatly improved, so in the embodiment of the application, the screen refresh rate can be reduced by keeping the original screen refresh rate, reducing the screen refresh rate based on the original screen refresh rate, or slightly improving based on the original screen refresh rate.

For example, in the related art, the screen refresh rate supports multi-gear refresh rate adjustment, and the screen refresh rate may be 60Hz in a conventional scenario. After detecting a Touch Down event, the screen refresh rate is raised to 120Hz or 144Hz.

In this embodiment of the present application, after a speech input scene is identified, the original screen refresh rate may be maintained, and the original screen refresh rate is not improved, for example, the screen refresh rate is maintained at 60Hz. Alternatively, a shift is raised based on the original screen refresh rate, such as raising the screen refresh rate from 60Hz to 90Hz. Alternatively, the screen refresh rate is reduced based on the original screen refresh rate, such as reducing the screen refresh rate from 60Hz to 30Hz. This is merely illustrative and is not limiting.

Similarly, the reduction of screen brightness in the embodiments of the present application is also relative to the related art. For example, in the related art, the screen brightness is changed in one interval under a normal scene (a scene in which the Touch Down event is not detected), and the screen brightness is greatly improved after the Touch Down event is detected. In the related art, the brightness of the original screen is greatly improved, so in the embodiment of the application, the brightness of the screen can be reduced by maintaining the brightness of the original screen, reducing the frequency on the basis of the brightness of the original screen, or slightly improving the brightness of the original screen.

For example, in the related art, the screen brightness varies between 100 nits (nit) and 300nit in a conventional scene, and after a Touch Down event is detected, the screen brightness is raised to 500nit.

In this embodiment of the present application, after a speech input scene is identified, the original screen brightness may be maintained without being enhanced, for example, the original screen brightness is maintained to be 300nit. Alternatively, the original screen brightness is increased by a second predetermined magnitude (e.g., 20% to 40%) based on the original screen brightness, such as from 300nit to 400nit. Alternatively, the screen brightness is reduced based on the original screen brightness, such as from 300nit to 200nit. This is merely illustrative and is not limiting.

S209, when the ending event is detected, the improvement of the power consumption parameter of the electronic device is not limited.

The end event may include a Touch Up event and/or a Audio Record stop event.

The Touch Up event represents a Touch Up event, such as a Touch Up event generated by a user when lifting a finger on a screen of an electronic device (e.g., a mobile phone). The Touch Up event may carry information on event type, touch end time, finger away position, etc.

The Audio Record stop event is also called a recording end event, and the Audio Record stop event can carry information such as recording end time, recording duration, a second UID of a second application calling an Audio record. Stop interface, a second application package name, and the like.

Illustratively, when a user's finger leaves the screen of an electronic device (e.g., a cell phone), the Touch sensor detects a finger lift operation of the user and a Touch Up event is correspondingly generated. It should be noted that, each time the finger of the user leaves the screen of the electronic device (such as a mobile phone), the Touch sensor detects the finger lifting operation of the user, and generates a corresponding Touch Up event, and different identification information is assigned to each generated Touch Up event to distinguish different Touch Up events.

In the embodiment of the application, IM application is taken asDescription is made for example. For example, a user long presses the "hold talk" button in the voice input box to begin speaking, after which the user performs a finger lift operation, generating a corresponding Touch Up event based on the finger lift operation. And transmitting the generated Touch Up event to the TP, and receiving the Touch Up event by the TP.

After receiving the Touch Up event, TP transmits the Touch Up event to the Input Dispatcher module. Because the Input Dispatcher module is added with the trigger function in advance, when the Touch Up event is detected, the Input Dispatcher module is triggered to transmit the Touch Up event to the IM application, and the IM application receives the Touch Up event.

The Input Dispatcher module also transmits the Touch Up event to the scene recognition service, and the scene recognition service receives the Touch Up event and records the identification information, event type, touch end time, finger leaving position and other information of the Touch Up event.

Meanwhile, after receiving the Touch Up event, the scene recognition service determines that the improvement of the power consumption parameters of the electronic equipment is not limited. The increase in the power consumption parameter of the electronic device is not limited may be not decreasing the power consumption parameter of the electronic device or not increasing the power consumption parameter to the first value. The scene recognition service sends the information to the power consumption adjustment parameter module, and the power consumption adjustment parameter module specifically executes the information.

For example, the Audio Record module may further include an Audio Record/stop interface in addition to the Audio Record/start interface, and the IM application invokes the Audio Record/stop interface in the system, after which the Audio Record module ends the recording of the voice, i.e. the Audio Record module generates Audio Record stop events. Audio Record stop event marks the end of recording.

Because the trigger function is added in advance in the Audio Record module, after the IM application calls the Audio record.stop interface, the Audio Record module is triggered to inform the scene recognition service, and the scene recognition service records Audio Record stop events, namely recording information such as recording end time, recording duration, a second UID of a second application calling the Audio record.stop interface, a second application package name and the like.

Meanwhile, the Audio Record module sends the recorded voice to the IM application, and the IM application sends the recorded voice to the chat object after receiving the recorded voice.

Similarly, the scene recognition service, upon receiving the Audio Record stop event, determines not to limit the increase in the power consumption parameters of the electronic device. The increase in the power consumption parameter of the electronic device is not limited may be not decreasing the power consumption parameter of the electronic device or not increasing the power consumption parameter to the first value. The scene recognition service sends the information to the power consumption adjustment parameter module, and the power consumption adjustment parameter module specifically executes the information.

It is noted that, in general, the Touch Up event occurs first, and the Audio Record stop event occurs later. For example, the user presses the "hold talk" button for voice input/recording, releases the "hold talk" button after voice input/recording is completed, and then generates a Touch Up event, and then ends recording, generating a Audio Record stop event.

Of course, there are exceptions, such as limited duration of single recording of voice for some IM applications, when the recording duration is exceeded, the Audio Record stop event occurs first and the Touch Up event occurs later. For example, the number of the cells to be processed, The duration of single recording voice is 60 seconds, when the recording duration reaches 60 seconds and the user presses the "press talk" key for a long time, a Audio Record stop event is generated first, and then a Touch Up event is generated along with the release of the "press talk" key by the user.

Optionally, in one possible implementation manner, the scene recognition service may further obtain a Touch Up event from other modules, and record identification information, event type, touch end time, finger leaving position, and other information of the Touch Up event. For example, the electronic device may include a Touch screen driver, and the Touch Up event is transferred from the TP driver to the application framework layer and then to the application layer, in other words, the TP driver, the application framework layer, and the application layer may detect the Touch Up event, and then the scene recognition service may obtain the Touch Up event from the TP driver, the application framework layer, or the application layer.

According to the scene processing method provided by the embodiment of the application, when a touch starting event is detected, a recording starting event is detected within a preset time interval, and the first application and the second application are detected to be consistent, the current scene is determined to be a voice input scene; in a voice input scene, reducing the power consumption parameter of the electronic equipment, or improving the power consumption parameter to a first value, wherein the first value is smaller than a second value, and the second value is the power consumption parameter improved by the electronic equipment when the touch starting event is detected.

In one possible implementation manner, the scene processing method provided by the embodiment of the application starts timing when the Touch Down event is detected, and simultaneously judges whether the application generating the Touch Down event is a focus application. If the recording start event is detected in the timing process, the scene recognition service executes the steps of judging whether the calling party of the Audio Record module is a focus application, judging the sequence of the recording start event and the Touch Down event, judging whether the time interval between the Touch Down event and the recording start event is smaller than the preset time interval or not, and the like. If the timing reaches the preset duration (e.g. 300 ms), the subsequent judging process is abandoned, namely, the steps of judging whether the calling party of the Audio Record module is the focus application, judging the sequence of the recording start event and the Touch Down event, judging whether the time interval between the Touch Down event and the recording start event is smaller than the preset time interval and the like are not executed.

In another possible implementation manner, when the recording start event is detected, the scene recognition service performs steps of determining whether the caller of the Audio Record module is a focus application, determining the sequence of the recording start event and the Touch Down event, and determining whether the time interval between the Touch Down event and the recording start event is smaller than a preset time interval.

In the implementation mode, because fewer recording start events are usually generated, the subsequent judging process is executed after the recording start event is detected, and the Touch Down event is not required to be continuously judged, so that the power consumption waste is avoided, and the power consumption is effectively reduced.

The scene processing method provided by the application is described in detail by a complete flow, and the scene processing method provided by the application is described below by taking the IM application as a main body.

Referring to fig. 9, fig. 9 is a schematic diagram of another scene processing method according to an embodiment of the present application. The method comprises the following steps:

s301, switching a voice input interface.

Illustratively, the user opens the IM application and enters a personal chat interface, such as the personal chat interface shown in FIG. 1 (b). The user clicks the voice input icon 102 in the personal chat interface, and at this time, the cell phone display interface is displayed from the personal chat interface shown in fig. 1 (b) to the voice input interface shown in fig. 1 (c). In the area below the voice input interface, the voice input icon 102 shown in fig. 1 (b) is changed to a keyboard input icon 104, the text input box 103 is changed to a voice input box 105, and the voice input box 105 further includes a "press and talk" button 1051.

S302, receiving long-time pressing recording key operation.

Illustratively, the record key is a "hold talk" key 1051. The user long presses the "hold talk" key 1051 and the im application receives a long press record key operation from the user.

S303, calling a recording start interface.

S304, receiving the record release key operation.

Illustratively, when the user releases the finger, the IM application receives a user's record key release operation.

S305, calling a recording end interface.

The IM application invokes, for example, an Audio Record/stop interface in the system, after which the Audio Record module ends the recording of the voice to obtain the recorded voice.

S306, sending the recorded voice.

Illustratively, the Audio Record module sends the recorded voice to the IM application, which sends the recorded voice to the chat object after receiving the recorded voice.

In the implementation mode, the IM application receives various operations of the user, realizes the voice chat function, timely gives feedback to the user, and brings good experience to the user.

The scene processing method provided by the application is described above by taking the IM application as a main body, and is described below by combining a software structure.

For ease of understanding, referring to fig. 10, fig. 10 is a flow chart of another scene processing method according to an embodiment of the present application. The method comprises the following steps:

s401, the IM application displays a voice input interface.

S402, clicking a voice input interface by a user to generate a Touch Down event.

The user presses the "hold talk" button 1051 long, producing a Touch Down event.

S403, the IM application receives the Touch Down event and displays a voice chat interface.

Illustratively, a trigger function is added in the Input Dispatcher module in advance, and when a Touch Down event is detected, the Input Dispatcher module is triggered to transmit the Touch Down event to the IM application, and the IM application receives the Touch Down event.

Meanwhile, the IM application displays a voice chat interface, such as the one shown in fig. 2 (b).

It should be noted that, in order to improve the fluency of the electronic device, a better experience is brought to the user, after the Input Dispatcher module detects the Touch Down event, the Input Dispatcher module sends the information of the Touch Down event to the power consumption parameter adjustment module, and the power consumption parameter adjustment module can improve the power consumption parameter of the electronic device. Therefore, even compared with the related art, the power consumption of the electronic equipment in the voice input scene is effectively reduced, the loss of each device of the electronic equipment is reduced while the user experience is not influenced, and the endurance and the service life of the electronic equipment are improved.

S404, the scene recognition service records the first time of the Touch Down event.

Illustratively, the scene recognition service receives the Touch Down event and records identification information of the Touch Down event, event type, touch start/start time, touch position, etc. The Touch start/start time of the Touch Down event is referred to as the first time in this embodiment.

S405, the IM application calls an Audio Record module and starts recording.

The Audio Record module may include an Audio Record.

S406, recording a second time of the recording start event and second identification information of a second application by the scene recognition service.

Illustratively, the scene recognition service records information such as a recording start time of a recording start event, a second UID of a second application calling the Audio record. In this embodiment, the recording start time of the recording start event is referred to as a second time.

S407, the scene recognition service acquires first identification information of the first application from the window manager.

Illustratively, a window manager (WindowManager) is configured to send a first UID to which a window of a first application belongs and a first application package name to a scene recognition service. The scene recognition service receives a first UID and a first application packet name sent by the window manager.

S408, the scene recognition service determines that the current scene is a voice input scene when detecting a touch start event, detecting a recording start event within a preset time interval and detecting that the first application and the second application are consistent.

S409, in a voice input scene, reducing the power consumption parameter of the electronic equipment, or improving the power consumption parameter to a first value.

Step S408 and step S409 may refer to the descriptions in step S101 and step S102, and are not described herein.

Optionally, in a possible implementation manner, in a voice input scenario, an upper limit value of CPU frequency, screen refresh rate, screen brightness, DDR frequency, TP point reporting rate, network resource, application drawing frame rate, system synthesis frame rate, etc. may also be set for the scene recognition service, and the upper limit value is sent to the power consumption parameter adjustment module. The power consumption parameter adjusting module takes the corresponding upper limit value as the limit when adjusting CPU frequency, screen refresh rate, screen brightness, DDR frequency, TP point reporting rate, application drawing frame rate, system composition frame rate and the like.

S410, detecting an end of recording event, and/or detecting a Touch Up event.

S411, does not limit the improvement of the power consumption parameter of the electronic device.

Step S410 and step S411 may refer to the description in step S209, and are not described herein.

Optionally, in a possible implementation manner, when a recording end event is detected and/or a Touch Up event is detected, the scene recognition service may cancel the upper limit value of the CPU frequency, the screen refresh rate, the screen brightness, the DDR frequency, the TP count rate, the network resource, the application drawing frame rate, the system synthesis frame rate, and the like, and send a cancellation operation to the power consumption parameter adjustment module. The power consumption parameter adjusting module is not limited by the corresponding upper limit value when adjusting CPU frequency, screen refresh rate, screen brightness, DDR frequency, TP point reporting rate, application drawing frame rate, system composition frame rate and the like.

According to the implementation method, when the voice input scene is identified, the voice input scene is identified according to the logic for generating the voice input scene instead of the layout information and the Activity information, so that the voice input scene can be accurately identified. The power consumption of the electronic equipment in the voice input scene is reduced, the loss of each device of the electronic equipment can be reduced while the user experience is not influenced, and the endurance and the service life of the electronic equipment are improved.

The scene processing method provided by the embodiment of the application can be applied to other scenes in which power consumption needs to be improved to increase fluency.

Examples of the scene processing method provided by the embodiment of the application are described in detail above. It will be appreciated that the electronic device, in order to achieve the above-described functions, includes corresponding hardware and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application in conjunction with the embodiments, but such implementation is not to be considered as outside the scope of this application.

The embodiment of the present application may divide the functional modules of the electronic device according to the above method example, for example, each function may be divided into each functional module, for example, the first determining unit, the second determining unit, the interpolation unit, and the like, or two or more functions may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.

It should be noted that, all relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.

The electronic device provided in this embodiment is configured to execute the above-described scene processing method, so that the same effects as those of the above-described implementation method can be achieved.

In case an integrated unit is employed, the electronic device may further comprise a processing module, a storage module and a communication module. The processing module can be used for controlling and managing the actions of the electronic equipment. The memory module may be used to support the electronic device to execute stored program code, data, etc. And the communication module can be used for supporting the communication between the electronic device and other devices.

Wherein the processing module may be a processor or a controller. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. A processor may also be a combination that performs computing functions, e.g., including one or more microprocessors, digital signal processing (digital signal processing, DSP) and microprocessor combinations, and the like. The memory module may be a memory. The communication module can be a radio frequency circuit, a Bluetooth chip, a WiFi chip and other equipment which interact with other electronic equipment.

In one embodiment, when the processing module is a processor and the storage module is a memory, the electronic device according to this embodiment may be a device having the structure shown in fig. 5.

The embodiment of the application also provides a computer readable storage medium, in which a computer program is stored, which when executed by a processor, causes the processor to execute the scene processing method of any of the embodiments above.

The present application also provides a computer program product, which when run on a computer, causes the computer to perform the above-mentioned related steps to implement the scene processing method in the above-mentioned embodiments.

The embodiment of the application also provides a chip. Referring to fig. 11, fig. 11 is a schematic structural diagram of a chip according to an embodiment of the present application. The chip shown in fig. 11 may be a general-purpose processor or a special-purpose processor. The chip includes a processor 510. Wherein the processor 510 is configured to perform the scene processing method of any of the above embodiments.

Optionally, the chip further comprises a transceiver 520, and the transceiver 520 is configured to receive control of the processor and is configured to support the communication device to perform the foregoing technical solution.

Optionally, the chip shown in fig. 11 may further include: a storage medium 530.

It should be noted that the chip shown in fig. 11 may be implemented using the following circuits or devices: one or more field programmable gate arrays (field programmable gate array, FPGA), programmable logic devices (programmable logic device, PLD), controllers, state machines, gate logic, discrete hardware components, any other suitable circuit or combination of circuits capable of performing the various functions described throughout this application.

The electronic device, the computer readable storage medium, the computer program product or the chip provided in this embodiment are used to execute the corresponding method provided above, so that the beneficial effects thereof can be referred to the beneficial effects in the corresponding method provided above, and will not be described herein.

It will be appreciated by those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and the parts shown as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions to cause a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A scene processing method, comprising:

when a touch start event is detected, a recording start event is detected within a preset time interval, and a first application and a second application are detected to be consistent, determining that a current scene is a voice input scene, wherein the first application is an application corresponding to the touch start event, and the second application is an application corresponding to the recording start event;

and in the voice input scene, reducing the power consumption parameter of the electronic equipment, or improving the power consumption parameter to a first value, wherein the first value is smaller than a second value, and the second value is the power consumption parameter improved by the electronic equipment when the touch starting event is detected.

2. The method for processing a scene according to claim 1, wherein before determining that the current scene is a speech input scene, further comprising:

Recording a first time of the touch initiation event when the touch initiation event is detected;

recording a second time of the recording start event when the recording start event is detected;

and when the difference value between the second time and the first time is smaller than the preset time interval, determining that the recording starting event is detected in the preset time interval.

3. The scene processing method according to claim 1 or 2, wherein before the determining that the current scene is a speech input scene, further comprising:

acquiring first identification information of the first application;

acquiring second identification information of the second application;

and when the first identification information is detected to be identical with the second identification information, determining that the first application and the second application are consistent.

4. A scene processing method according to claim 3, wherein the first identification information comprises a first UID and/or a first application package name, and the second identification information comprises a second UID and/or a second application package name.

5. The scene processing method according to any of claims 1 to 4, wherein the power consumption parameters include any one or any combination of CPU frequency, screen refresh rate, screen brightness, double data rate frequency, touch screen dot count rate, application drawing frame rate, and system composition frame rate.

6. The scene processing method according to any of claims 1 to 5, characterized in that the method further comprises:

after the power consumption parameters of the electronic equipment are reduced in the voice input scene, when a touch ending event and/or a recording ending event is detected, the power consumption parameters of the electronic equipment are not reduced;

or after the power consumption parameter is increased to the first value in the voice input scene, when a touch end event and/or a recording end event is detected, the power consumption parameter is not increased to the first value.

7. The scene processing method according to claim 6, wherein the electronic device includes a touch screen driver, the method further comprising:

and detecting the touch starting event and/or the touch ending event through the touch screen driving or input scheduling thread module.

8. The scene processing method according to claim 6, wherein the electronic device includes a microphone driver, the method further comprising:

detecting the recording start event and/or the recording end event by the microphone driver.

9. An electronic device, comprising: one or more processors; one or more memories; the memory stores one or more programs that, when executed by the processor, cause the electronic device to perform the method of any of claims 1-8.

10. A chip, comprising: a processor for calling and running a computer program from a memory, causing an electronic device on which the chip is mounted to perform the method of any one of claims 1 to 8.

11. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, causes the processor to perform the method of any of claims 1 to 8.