WO2022160958A1 - 一种页面分类方法、页面分类装置和终端设备 - Google Patents

一种页面分类方法、页面分类装置和终端设备 Download PDF

Info

Publication number
WO2022160958A1
WO2022160958A1 PCT/CN2021/136531 CN2021136531W WO2022160958A1 WO 2022160958 A1 WO2022160958 A1 WO 2022160958A1 CN 2021136531 W CN2021136531 W CN 2021136531W WO 2022160958 A1 WO2022160958 A1 WO 2022160958A1
Authority
WO
WIPO (PCT)
Prior art keywords
page
target
foreground
control
information
Prior art date
Application number
PCT/CN2021/136531
Other languages
English (en)
French (fr)
Inventor
田舒
徐仕勤
赵安
甘雯辉
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022160958A1 publication Critical patent/WO2022160958A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to a classification technology in the field of artificial intelligence (Artificial Intelligence, AI), and specifically relates to a page classification method, a page classification device, and a terminal device.
  • AI Artificial Intelligence
  • Some apps even have the function of preventing users from entering the app or the mobile phone from being disabled according to the agreed usage time, so as to help users get rid of the trouble of being addicted to mobile phones and allow users to enjoy a healthier digital life.
  • the current method of classifying the apps used is inaccurate, and it is impossible to accurately perceive user behavior.
  • the page classification method, page classification device, and terminal device provided by the embodiments of the present application can accurately identify the usage scenario of the App, and accurately classify the pages of the usage scenario, thereby more comprehensively perceiving user behavior habits and better providing users with Intelligent advice service.
  • an embodiment of the present application provides a page classification method, the page classification method includes: detecting a foreground page switching of a terminal device, wherein the switching of the foreground page is triggered by a user operation; The attribute information of the target control of the foreground page, wherein the target control includes at least a visible control, and the attribute information includes the type and coordinate position of the target control; Classification.
  • the page classification method of the embodiment of the present application does not classify the page according to the type of the app, but classifies the page according to the layout information presented by the control type and coordinate position of the page, and the page can be a web page or an interface of an app, thereby It can accurately identify usage scenarios, accurately classify the pages of the usage scenarios, and more comprehensively perceive user behavior and habits, and better provide users with intelligent advice services.
  • the classifying the foreground page according to the type and coordinate position of the target control includes: generating a layout block diagram of the foreground page based on the type and coordinate position of the target control; The foreground pages are classified according to the layout block diagram.
  • the foreground page can be transformed into a layout block diagram, in which a rectangular box is used to represent the location of the target control on the foreground page.
  • the layout block diagram categorizes the front page.
  • the target controls of the foreground page include multiple types
  • classifying the foreground page according to the type and coordinate position of the target controls includes: classifying the target controls by type Divided into multiple groups, and each group includes one or more types of target controls; based on the types and coordinate positions of the multiple groups of the target controls, a plurality of layout block diagrams are generated respectively; according to the plurality of layout block diagrams, the foreground page sort.
  • the target controls of the foreground page include multiple types
  • the target controls can be divided into multiple groups according to the types first, and then the layout block diagram of each group of target controls can be generated according to the coordinate positions, so that The type of the foreground page can be known by comparing the multiple layout block diagrams generated by each group of target controls with the multiple layout block diagrams generated by the known types of pages according to the control types.
  • the page classification method further includes: acquiring auxiliary information related to the switched foreground page, the auxiliary information including semantic information of the target control, physical information of the terminal device at least one of the usage information of the device and the usage information of the software of the terminal device, wherein the physical device includes at least one of a microphone, a speaker and a camera, and the software includes an input method;
  • the according to The classification of the foreground page by the type and coordinate position of the target control includes: classifying the foreground page according to the type and coordinate position of the target control and the auxiliary information.
  • the foreground page in addition to classifying the foreground page according to the type and coordinate position of the target control of the foreground page, the foreground page can also be classified with the help of some auxiliary information.
  • the auxiliary information may be semantic information of the target control. If it is judged by the type and coordinate position of the target control on the foreground page that the foreground page may be of communication type and shopping type, and the semantic information is, for example, "Have you eaten?", it can be judged that the foreground page is of communication type. When the semantic information is, for example, "What's the price?", it can be determined that the front page is a shopping category.
  • the auxiliary information can also be the usage of physical devices.
  • auxiliary information can also be the usage of the software, and the software can be an input method.
  • the input method is in use, it means chatting, and the page is a communication type.
  • the target controls of the foreground page include multiple types
  • classifying the foreground page according to the type and coordinate position of the target controls includes: classifying the target controls by type Divided into multiple groups, each group includes one or more types of target controls; the attribute information of the multiple groups of the target controls is respectively input into the multiple input channels of the pre-trained classifier model, wherein the multiple groups of the The attribute information of the target control corresponds to the multiple input channels one-to-one; the pre-trained classifier model is used to classify the foreground page.
  • the target controls can be divided into multiple groups according to types, and then the attribute information of the multiple groups of target controls can be input into multiple input channels of the classifier model, so that each channel processes a group of target controls
  • the attribute information is helpful to reduce the complexity of data processing by the classifier model and improve the classification accuracy of the classifier model.
  • the inputting the attribute information of the multiple groups of the target controls into the multiple input channels of the pre-trained classifier model respectively includes: inputting the attribute information of each group of the target controls according to the data Form input in the channel of the pre-trained classifier model; Or, draw a layout block diagram according to the coordinate position of the attribute information of each group of described target controls; Layout block diagrams within the channels of the input pre-trained classifier model.
  • the attribute information of the grouped target controls can be input into the channel of the pre-trained model according to the data information, or the layout block diagram of each group of target controls can be drawn according to the coordinate position, and then the layout A block diagram of the input to the pretrained classifier model within the channels.
  • the page classification method further includes: acquiring auxiliary information related to the switched foreground page, the auxiliary information including semantic information of the target control, information of the terminal device at least one of usage information of a physical device and usage information of software of the terminal device, wherein the physical device includes at least one of a microphone, a speaker, and a camera, and the software includes an input method;
  • the Inputting the attribute information of the multiple groups of the target controls respectively into the multiple input channels of the pre-trained classifier model includes: respectively inputting the attribute information of the multiple groups of the target controls and the auxiliary information into multiple input channels of the pre-trained classifier model. into the input channel.
  • auxiliary information can be input into the classifier model, thereby improving the accuracy of the output result of the classifier model.
  • the auxiliary information includes the semantic information of the target controls
  • the attribute information and semantic information of multiple groups of target controls can be input into the multiple input channels of the pre-trained classifier model respectively
  • the auxiliary information includes the physical components of the terminal equipment and at least one of the usage information of the software
  • the attribute information of multiple groups of target controls can be respectively input into the multiple input channels of the pre-trained classifier model, the usage information of the physical device and the usage information of the software.
  • At least one can be input into a specific channel of the classifier model, and the specific channel can be different from the plurality of input channels into which attribute information and semantic information of the target control are input.
  • the type of the target control includes at least one of a button control, a text control, an image control, and an edit text control.
  • the type of target controls may include only text controls, or both text controls and image controls.
  • the types of the foreground pages include communication, shopping, reading, video, game, music and other types.
  • “other categories” refers to categories other than communication, shopping, reading, video, games, and music.
  • the obtaining the attribute information of the target control of the foreground page after switching includes: obtaining the layout information of the decorView of the foreground page after switching, where the layout information is a polytree structure; obtain the attribute information of the leaf node control of the multi-tree structure from the layout information of the decorView, the leaf node control includes the visible control and the invisible control of the foreground page, wherein the leaf node control is the last Nth layer of the multi-tree structure, and N is greater than or equal to 1.
  • the attribute information of the control that is, the control type and coordinate position
  • the attribute information of the control can be obtained by means of the multi-fork tree structure in the decorView, so as to accurately classify the page, perceive the user's behavior more comprehensively, and better Provide users with intelligent advice services. Since only the user-visible leaf node control information needs to be obtained, power consumption can be reduced in actual operation, and the training efficiency of the classifier model can be improved.
  • the acquiring the property information of the target control of the foreground page after switching further includes: filtering the leaf node controls to acquire the property information of the visible controls of the foreground page .
  • leaf node controls of the multi-tree structure include visible controls and invisible controls, and users generally do not operate invisible controls, only the attribute information of the visible controls can be filtered, so that more Accurately perceive the user's operation behavior.
  • an embodiment of the present application provides a page classification apparatus, and the page classification method apparatus includes: a detection module configured to detect the switching of a foreground page of a terminal device, wherein the switching of the foreground page is triggered by a user operation; an acquisition module, used for acquiring attribute information of the target control of the foreground page after switching, wherein the target control includes at least a visible control, and the attribute information includes the type and coordinate position of the target control; a classification module is used for according to The type and coordinate position of the target control classifies the foreground page.
  • the classification module is specifically configured to: generate a layout block diagram of the foreground page based on the type and coordinate position of the target control; and classify the foreground page according to the layout block diagram.
  • the target controls of the foreground page include multiple types
  • the classification module is specifically configured to: divide the target controls into multiple groups according to types, and each group includes one or more than two types Type target controls; generate multiple layout block diagrams based on the types and coordinate positions of multiple groups of the target controls; classify the foreground page according to the multiple layout block diagrams.
  • the obtaining module is further configured to obtain auxiliary information related to the switched foreground page, the auxiliary information including semantic information of the target control, physical information of the terminal device at least one of usage information of a device and usage information of software of the terminal device, wherein the physical device includes at least one of a microphone, a speaker, and a camera, and the software includes an input method; the classification The module is configured to classify the foreground page according to the type and coordinate position of the target control and the auxiliary information.
  • the target controls of the foreground page include multiple types
  • the classification module is specifically configured to: divide the target controls into multiple groups according to types, and each group includes one or more than two types Types of target controls; the attribute information of multiple groups of the target controls are respectively input into the multiple input channels of the pre-trained classifier model, wherein the attribute information of the multiple groups of the target controls and the multiple input channels are one by one. Corresponding; classifying the foreground page using the pre-trained classifier model.
  • the classification module is further specifically configured to: input the attribute information of each group of the target controls into the channel of the pre-trained classifier model in the form of data; or, according to each group of the target controls
  • the coordinate position of the attribute information of the control generates a layout block diagram; the type of each group of the target control and the layout block diagram representing the coordinate position are input into the channel of the pre-trained classifier model.
  • the obtaining module is further configured to obtain auxiliary information related to the switched foreground page, the auxiliary information including semantic information of the target control, physical information of the terminal device at least one of usage information of a device and usage information of software of the terminal device, wherein the physical device includes at least one of a microphone, a speaker, and a camera, and the software includes an input method; the classification The module is also specifically configured to input the attribute information of multiple groups of target controls and the auxiliary information respectively into multiple input channels of the pre-trained classifier model.
  • the type of the target control includes at least one of a button control, a text control, an image control, and an edit text control.
  • the types of the foreground pages include communication, shopping, reading, video, game, music and other types.
  • the obtaining module is specifically configured to: obtain the layout information of the decorView of the foreground page after switching, and the layout information is a multi-tree structure; obtain from the layout information of the decorView
  • the attribute information of the leaf node control of the multi-tree structure, the leaf node control includes the visible control and the invisible control of the foreground page, wherein the leaf node control is the last Nth of the multi-tree structure layer, N is greater than or equal to 1.
  • the obtaining module is further specifically configured to: filter the leaf node controls to obtain attribute information of the visible controls on the foreground page.
  • an embodiment of the present application provides a terminal device, where the terminal device includes a memory and a processor, where the memory is used to store a computer program; the processor is used to execute the above-mentioned first program when the computer program is invoked A method in any possible implementation of the aspect or the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium for storing a computer program.
  • the computer program When executed by a processor of a terminal device, the terminal device can implement the first aspect or the first aspect above.
  • an embodiment of the present application provides a computer program product, where the computer program product includes a computer program/instruction, when the computer program/instruction is run on a terminal device, the terminal device is made to implement the above-mentioned first A method in any possible implementation of the aspect or the first aspect.
  • the page classification method and page classification device of the embodiments of the present application do not classify according to App types, but classify pages in real time according to the type of controls on the page and the layout structure presented by the coordinate positions, and the layout structure of the page can be input into the CNN neural network.
  • the trained classifier model can be used to classify the user's operation behavior, which can accurately identify the usage scenario of the app, and accurately classify the pages of the usage scenario, so as to more comprehensively perceive the user's behavior and habits.
  • the solution in the embodiment of the present application only needs to obtain the leaf node control information visible to the user, which can reduce power consumption and improve model training efficiency in actual operation.
  • 1 is a schematic diagram of the hardware structure of a mobile phone
  • Fig. 2 is the structural representation of the software system adopted by the mobile phone of Fig. 1;
  • Figure 3-1 to Figure 3-6 are examples of six types of pages
  • Fig. 4 is the structural principle diagram of the page of the terminal device
  • FIG. 5 is a flowchart of a method for classifying pages according to an embodiment of the present application.
  • Fig. 6 is a kind of concrete flow chart of step S506 in Fig. 5;
  • Fig. 7 is another specific flow chart of step S506 in Fig. 5;
  • Fig. 8 is another specific flow chart of step S506 in Fig. 5;
  • Fig. 9-Fig. 11 is the concrete process diagram that obtains the input image by the front page
  • Fig. 12 is a process diagram of converting a layout block diagram of a type of control into a grid matrix
  • Fig. 13 is the process diagram that the front page input classifier model is classified
  • FIG. 14 is a schematic diagram of a system architecture applied by an embodiment of the present application.
  • FIG. 16 is a statistic diagram of classification operation duration according to an embodiment of the present application.
  • Fig. 17 is the reminder diagram of the healthy use of the mobile phone according to the embodiment of the application.
  • FIG. 18 is a schematic structural diagram of a page classification apparatus provided by an embodiment of the present application.
  • FIG. 19 is a schematic structural diagram of a terminal device according to an embodiment of the application.
  • first and second are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as “first” or “second” may expressly or implicitly include one or more of that feature.
  • the terms “including”, “including”, “having” and their variants mean “including but not limited to” unless specifically emphasized otherwise.
  • FIG. 1 is a schematic diagram of the hardware structure of a mobile phone.
  • the mobile phone 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, Antenna 1, Antenna 2, RF module 150, communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, screen 194 , and a subscriber identification module (subscriber identification module, SIM) card interface 195 and the like.
  • SIM subscriber identification module
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the mobile phone 100 .
  • the mobile phone 100 may include more or less components than shown, or some components may be combined, or some components may be separated, or different component arrangements.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the terminal device in this embodiment of the present application may include a processor 110, a communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, a camera 193, a screen 194, and the like.
  • the sensor module 180 may include a pressure sensor 180A, a touch sensor 180K, etc., and may be used to detect the user's pressing and touching operations to perform corresponding actions, such as switching pages.
  • the processor 110 can run the page classification method provided by the embodiment of the present application to classify the page according to the layout information presented by the control type and coordinate position of the page, so as to accurately identify the usage scenario of the App, and accurately classify the pages of the usage scenario.
  • the processor 110 may include different devices, for example, when the CPU and NPU (AI chip) are integrated, the CPU and NPU may cooperate to execute the page classification method of the application embodiment, such as detecting the switching of the foreground page and obtaining the target control of the switched foreground page. Attribute information and the like are executed by CPU, for example, classifier model training and application are executed by NPU, so as to obtain faster processing efficiency.
  • CPU and NPU AI chip
  • the terminal device can control the screen 194 to switch the foreground page (ie the user-visible page) in response to the user operation, and display the classification result of the foreground page.
  • the screen 194 can also display the classification statistics results based on the page classification method of the embodiment of the present application as shown in FIG. 16 , and intelligent suggestion services provided to users from a health perspective according to the statistics results, such as using a mobile phone for a long time to write articles. Or read news, pop up a card to remind users to take a break or drop eye drops to protect their eyesight, as shown in Figure 17.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor 110, a graphics processor 110 (graphics processing unit, GPU), an image signal processor 110 (image signal processor, ISP), controller, memory, video codec, digital signal processor 110 (digital signal processor, DSP), baseband processor 110 and/or neural-network processor 110 processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated into one or more processors 110 .
  • application processor application processor, AP
  • modem processor 110 graphics processor 110
  • GPU graphics processor 110
  • image signal processor 110 image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor 110
  • DSP digital signal processor
  • baseband processor 110 baseband processor 110 and/or neural-network processor 110 processing unit, NPU
  • different processing units may be independent devices, or may be integrated into one or more processors 110 .
  • the controller may be the nerve center and command center of the mobile phone 100 .
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the screen 194, the camera 193, the communication module 160, and the like.
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance).
  • the wireless communication function of the mobile phone 100 may be implemented by the antenna 1, the antenna 2, the radio frequency module 150, the communication module 160, the modulation and demodulation processor 110, the baseband processor 110, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in handset 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network.
  • the radio frequency module 150 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied on the mobile phone 100 .
  • the radio frequency module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), and the like.
  • the radio frequency module 150 can receive electromagnetic waves from the antenna 1 , filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor 110 for demodulation.
  • the radio frequency module 150 can also amplify the signal modulated by the modulation and demodulation processor 110 , and then convert it into an electromagnetic wave for radiation through the antenna 1 .
  • the modem processor 110 may include a modulator and a demodulator. Wherein, the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor 110 for processing. The low frequency baseband signal is processed by the baseband processor 110 and then passed to the application processor. The application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the screen 194 .
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor 110 for processing.
  • the low frequency baseband signal is processed by the
  • the communication module 160 can provide applications on the mobile phone 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), Bluetooth (BT), global navigation satellite system ( global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • BT Bluetooth
  • GNSS global navigation satellite system
  • frequency modulation frequency modulation, FM
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the communication module 160 may be one or more devices integrating at least one communication processing module.
  • the communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify the signal, and then convert it into electromagnetic waves for radiation through the antenna
  • the antenna 1 of the mobile phone 100 is coupled with the radio frequency module 150, and the antenna 2 is coupled with the communication module 160, so that the mobile phone 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), 5G, BT, GNSS, WLAN , NFC, FM, and/or IR technology, etc.
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (GLONASS), a Beidou satellite navigation system (beidou navigation satellite system, BDS), a quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou satellite navigation system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the mobile phone 100 can realize the shooting function through the ISP, the camera 193, the video codec, the GPU, the screen 194, and the application processor.
  • the ISP is used to process the data fed back by the camera 193 .
  • the shutter is opened, the light is transmitted to the photosensitive element of the camera 193 through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera 193 transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithmic optimization on image noise, brightness, and skin tones. ISP can also optimize parameters such as exposure and color temperature of the shooting scene.
  • Camera 193 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the mobile phone 100 may include one or N cameras 193 , where N is a positive integer greater than one.
  • the digital signal processor 110 is used for processing digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the mobile phone 100 selects a frequency point, the digital signal processor 110 is used to perform Fourier transform on the frequency point energy and the like.
  • Video codecs are used to compress or decompress digital video.
  • the handset 100 may support one or more video codecs.
  • the mobile phone 100 can play or record videos in various encoding formats, such as: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG moving picture experts group
  • MPEG2 moving picture experts group
  • MPEG3 MPEG4
  • MPEG4 moving picture experts group
  • the NPU is a neural-network (NN) computing processor.
  • NPU neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the NPU can be used to train the classifier model.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 .
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing the instructions stored in the internal memory 121 .
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area may store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be answered by placing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the earphone jack 170D is used to connect wired earphones.
  • the earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
  • the pressure sensor 180A may be provided on the display screen 194 .
  • the capacitive pressure sensor may be comprised of at least two parallel plates of conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes.
  • the electronic device 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
  • touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example, when a touch operation whose intensity is less than the first pressure threshold acts on the short message application icon, the instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, the instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the motion attitude of the electronic device 100 .
  • the angular velocity of electronic device 100 about three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for image stabilization.
  • the gyro sensor 180B detects the shaking angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to offset the shaking of the electronic device 100 through reverse motion to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenarios.
  • the air pressure sensor 180C is used to measure air pressure.
  • the electronic device 100 calculates the altitude through the air pressure value measured by the air pressure sensor 180C to assist in positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 can detect the opening and closing of the flip holster using the magnetic sensor 180D.
  • the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Further, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, characteristics such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes).
  • the magnitude and direction of gravity can be detected when the electronic device 100 is stationary. It can also be used to identify the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
  • the electronic device 100 can measure the distance through infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 can use the distance sensor 180F to measure the distance to achieve fast focusing.
  • Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • the light emitting diodes may be infrared light emitting diodes.
  • the electronic device 100 emits infrared light to the outside through the light emitting diode.
  • Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100 . When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100 .
  • the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • Proximity light sensor 180G can also be used in holster mode, pocket mode automatically unlocks and locks the screen.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, accessing application locks, taking pictures with fingerprints, answering incoming calls with fingerprints, and the like.
  • the temperature sensor 180J is used to detect the temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 performs thermal protection by reducing the performance of the processor located near the temperature sensor 180J in order to reduce power consumption. In other embodiments, when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 caused by the low temperature. In some other embodiments, when the temperature is lower than another threshold, the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also called “touch panel”.
  • the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the location where the display screen 194 is located.
  • the ambient light sensor 180L is used to sense ambient light brightness.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in a pocket, so as to prevent accidental touch.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 180M can also contact the pulse of the human body and receive the blood pressure beating signal.
  • the bone conduction sensor 180M can also be disposed in the earphone, combined with the bone conduction earphone.
  • the audio module 170 can analyze the voice signal based on the vibration signal of the vocal vibration bone block obtained by the bone conduction sensor 180M, so as to realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beat signal obtained by the bone conduction sensor 180M, and realize the function of heart rate detection.
  • the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
  • the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
  • Motor 191 can generate vibrating cues.
  • the motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback.
  • touch operations acting on different applications can correspond to different vibration feedback effects.
  • the motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
  • FIG. 2 is a schematic structural diagram of a software system used in the mobile phone of FIG. 1 .
  • the Android system can usually be divided into four layers, from top to bottom, the application layer, the application framework layer, the system library, the Android runtime, and the kernel layer. Clear roles and division of labor, communication between layers through software interfaces.
  • the application layer includes a series of applications deployed on the mobile phone 100 .
  • the application layer includes, but is not limited to, a desktop launcher (Launcher), a setting module, a calendar module, a camera module, a call module, and a short message module.
  • a desktop launcher Launcher
  • the setting module includes, but is not limited to, a setting module, a calendar module, a camera module, a call module, and a short message module.
  • the application framework layer can provide an application programming interface (API) and a programming framework for the applications in the application layer, and can also include some predefined functional modules/services.
  • the application framework layer includes, but is not limited to, a window manager (Window manager), an activity manager (Activity manager), a package manager (Package manager), a resource manager (Resource manager), and a power manager (Power manager).
  • the activity manager is used to manage the life cycle of the application and implement the navigation and rollback function of each application.
  • the activity manager may be responsible for the creation of an activity (Activity) process and maintenance of the life cycle of the created Activity process.
  • a window manager is used to manage window programs.
  • the graphical user interface of an application is usually composed of one or more activities, and the activities are composed of one or more views; the window manager can add the views included in the graphical user interface to be displayed on the screen 194 , or to remove View from the graphical user interface displayed on screen 194 .
  • the Android runtime and system library, kernel layer, etc. located below the application framework layer can be called the underlying system.
  • the underlying system includes the underlying display system for providing display services.
  • the underlying display system may include but not limited to the surface located in the system library. Manager (surface manager) and display driver at the kernel layer.
  • the kernel layer is the layer between hardware and software, and the kernel layer includes several hardware drivers. Exemplarily, the kernel layer may include a display driver, a camera driver, an audio driver, and a touch driver. Each driver can collect the information collected by the corresponding hardware, and report the corresponding monitoring data to the status monitoring service or other functional modules in the system library.
  • terminal devices such as mobile phones have become indispensable tools in people's lives.
  • some apps have set up the function of using the duration according to the agreement, and they will not be allowed to enter the app or mobile phone after timeout.
  • the function fails to help users get rid of the trouble of indulging in mobile phones and allow users to enjoy a healthier digital life.
  • short video apps no longer simply support publishing short videos, watching short videos, liking, For functions such as comments, a chat window has also been added to short videos to facilitate users’ chatting and making friends.
  • the Douyin App supports both short videos and chats. In this way, if the short video app is classified as a video app type for duration statistics, it will cause errors.
  • the classification method for recognizing image content based on page screenshots mainly uses convolution neural network (CNN) to classify images. Because the information contained in the picture is too rich, such as graphics, images and text, which are redundant information for page picture recognition, the accuracy of the classification result is affected, and the power consumption and training cost are increased.
  • CNN convolution neural network
  • the embodiments of the present application provide a page classification method, a page classification device, and a terminal device, which do not classify according to App types, but classify pages in real time according to the layout information of the pages, which can accurately identify the usage scenarios of the Apps, and can accurately identify the usage scenarios of the Apps.
  • the pages of this usage scenario are accurately classified, so as to more comprehensively perceive the user's behavior and habits, and better provide users with intelligent advice services.
  • pages can be divided into 7 categories according to the page layout, namely communication, shopping, reading, video, games, music, and other categories. Of course, pages can also be divided into more categories or more according to actual needs. Fewer categories.
  • the layout structure of the page can also be input into the CNN neural network for model training, and the trained classifier model can be applied to classify the user's operation behavior.
  • the solution in the embodiment of the present application only needs to obtain the leaf node control information visible to the user, which can reduce power consumption and improve model training efficiency in actual operation.
  • the page classification method in the embodiments of the present application is applicable to any terminal device with pages, including but not limited to mobile phones, tablet computers (PADs), smart screens (TVs) and other devices used daily.
  • Figures 3-1 to 3-6 are examples of six types of pages. Specifically, Figure 3-1 is a communication page, Figure 3-2 is a shopping page, Figure 3-3 is a reading page, Figure 3-4 is a video page, Figure 3-5 is a game page, and Figure 3 -6 is the music category page.
  • Figure 3-1 is a communication page
  • Figure 3-2 is a shopping page
  • Figure 3-3 is a reading page
  • Figure 3-4 is a video page
  • Figure 3-5 is a game page
  • Figure 3 -6 is the music category page.
  • the communication page shown in Figure 3-1 is usually divided into three parts. The top is the navigation bar, which indicates the chat object; the middle is the main content of the chat.
  • the avatar is the starting point to add a message to the left or right, and the message can be text or a picture, etc.; at the bottom is the toolbar, which provides a button for switching to voice input, an input bar, and emoji and extended function buttons.
  • Another example is the shopping page shown in Figure 3-2, which usually consists of four parts.
  • the top is the navigation bar, which provides button operations such as search, return, and sharing;
  • the bottom layer is the product display bar, which configures the picture display of various products.
  • the next layer is the text introduction about the product;
  • the bottom is the toolbar, which provides button operations such as customer service, collection, adding to shopping cart, and submitting purchase. Based on this, a method is proposed to classify and count operation pages according to business scenarios, which is no longer limited by App type and image content, and can more accurately perceive the user's operation behavior.
  • FIG. 4 is a schematic structural diagram of a page of a terminal device.
  • opening an application will actually open a main activity, and the user can switch back and forth between multiple activities by touching different controls on the screen. For example: you can open a small window of a menu from the menu key; or click a button to jump from one page to another.
  • the PhoneWindow is actually initialized first, and then the internal class DecorView in the PhoneWindow loads the layout set in the Activity.
  • the ViewRoot in WindowManager is the management class that really handles view drawing and other events in DecorView. Window interacts with WindowManagerService through WindowManager, and finally presents a specific page view to the user.
  • the page views that the user sees are all displayed by processing the layout in the decorView, and similar page views have similar layout structures.
  • the overall similarity of the classification based on the pictures is very low, but from the perspective of the layout structure of the page, the similarity is very high. Therefore, only by extracting the layout structure of the page, the front page can be classified according to the layout structure of the page.
  • FIG. 5 is a flowchart of a page classification method provided by an embodiment of the present application. As shown in Figure 5, the page classification method includes the following steps:
  • Step S502 detecting the switching of the foreground page of the terminal device, wherein the switching of the foreground page is triggered by a user operation.
  • Step S504 Obtain attribute information of the target control of the switched foreground page, wherein the target control includes at least visible controls, and the attribute information includes the type and coordinate position of the target control.
  • Types of target controls include at least one of button controls, text controls, image controls, and edit text controls. For example, include a button control, or include a button control and a text control.
  • the types of target controls can also include more types, such as list controls.
  • the layout information of the decorView of the switched foreground page may be obtained first, and the layout information is a multi-tree structure. Then, the attribute information of the leaf node control of the multi-fork tree structure is obtained from the layout information of the decorView.
  • the leaf node control includes the visible control and the invisible control of the front page, wherein the leaf node control is the penultimate Nth layer of the multi-fork tree structure, N is greater than or equal to 1.
  • the attribute information of the control can be obtained with the help of the multi-fork tree structure in the decorView, so as to obtain the control type and control layout of the front page, so as to accurately classify the page and perceive the user behavior more comprehensively. Habits to better provide users with intelligent advice services.
  • power consumption can be reduced in actual operation, and the training efficiency of the classifier model can be improved.
  • the leaf node controls can also be filtered to obtain property information of the visible controls on the foreground page. Since the leaf node controls of the multi-tree structure include visible controls and invisible controls, and users generally do not operate invisible controls, only the attribute information of the visible controls can be filtered, so that the user's operation behavior can be more accurately perceived.
  • Step S506 classify the foreground page according to the type and coordinate position of the target control.
  • the types of front pages include communication, shopping, reading, video, games, music and others.
  • “other categories” refers to categories other than communication, shopping, reading, video, games, and music.
  • the page classification method may further include the following steps:
  • Step S505 obtaining auxiliary information related to the switched foreground page, the auxiliary information includes at least one of semantic information of the target control, usage information of the physical device of the terminal device, and usage information of the software of the terminal device, wherein,
  • the physical device includes at least one of a microphone, a speaker, and a camera, and the software includes an input method.
  • Step S506' classify the foreground page according to the type and coordinate position of the target control and auxiliary information.
  • the auxiliary information can be the semantic information of the target control
  • the semantic information is, for example, "Have you eaten?"
  • the auxiliary information is the usage of physical devices, for example, physical devices such as microphones and speakers are in use, indicating that a call is in progress
  • the page is a communication class.
  • the auxiliary information is the usage of the software, for example, the software can be an input method. When the input method is in use, it means that the user is chatting, and the page is a communication type.
  • FIG. 6 is a specific flowchart of step S506 in FIG. 5 . As shown in Figure 6, step S506 may include the following specific steps:
  • Step S5062 a layout block diagram of the foreground page is generated based on the type and coordinate position of the target control.
  • Step S5064 classify the foreground pages according to the layout block diagram.
  • the foreground page can be transformed into a layout block diagram, in which a rectangular box is used to indicate the location of the target control of the foreground page. sort.
  • FIG. 7 is another specific flowchart of step S506 in FIG. 5 .
  • the target control of the foreground page includes multiple types, and step S506 may include the following specific steps:
  • Step S5062' the target controls are divided into multiple groups according to types, and each group includes one or more than two types of target controls.
  • Step S5064' based on the types and coordinate positions of multiple groups of target controls, respectively generate multiple layout block diagrams.
  • Step S5066' classify the foreground pages according to the multiple layout block diagrams.
  • the target controls of the foreground page include multiple types
  • the multiple layout block diagrams generated by the control are compared with the multiple layout block diagrams generated by the known types of pages according to the control types, so as to know the type of the foreground page.
  • the page classification method of the embodiment of the present application does not classify the page according to the type of the app, but classifies the page according to the control type and layout information (that is, the coordinate position) of the page.
  • the page can be a web page or an app interface, and can accurately identify the usage scene , to accurately classify the pages of the usage scenario, so as to more comprehensively perceive the user's behavior and habits, and better provide users with intelligent suggestion services.
  • layout structure of the page can also be input into the CNN neural network for model training, and the trained classifier model can be applied to classify the user's operation behavior.
  • FIG. 8 is another specific flowchart of step S506 in FIG. 5 .
  • the target controls of the foreground page include multiple types, and step S506 may include the following specific steps:
  • Step S5062 divide the target controls into multiple groups according to types, and each group includes one or more than two types of target controls.
  • Step S5064 input attribute information of multiple groups of target controls into multiple input channels of the pre-trained classifier model, wherein the attribute information of multiple groups of target controls corresponds to multiple input channels one-to-one.
  • the attribute information of each group of target controls can be input into the channel of the pre-trained classifier model in the form of data.
  • the type of each group of target controls and the layout block diagram representing the coordinate position are input into the channel of the pre-trained classifier model.
  • Step S5066 use the pre-trained classifier model to classify the foreground page.
  • the target controls can be divided into multiple groups according to the type, and then the attribute information of the multiple groups of target controls can be input into multiple input channels of the classifier model, so that each channel processes the attribute information of a group of target controls, which is helpful for It is used to reduce the complexity of data processing by the classifier model and improve the classification accuracy of the classifier model.
  • step S5065 can also be performed first to obtain auxiliary information related to the switched foreground page, and the auxiliary information includes semantic information of the target control, usage information of the physical device of the terminal device and terminal device.
  • the auxiliary information includes semantic information of the target control, usage information of the physical device of the terminal device and terminal device.
  • At least one of the usage information of the software wherein the physical device includes at least one of a microphone, a speaker and a camera, the software includes an input method, and then the attribute information and auxiliary information of multiple groups of target controls are respectively input into the pre-trained within multiple input channels of a classifier model.
  • auxiliary information can be input into the classifier model, thereby improving the accuracy of the output result of the classifier model.
  • the auxiliary information includes the semantic information of the target controls
  • the attribute information and semantic information of multiple groups of target controls can be input into the multiple input channels of the pre-trained classifier model respectively
  • the auxiliary information includes the physical components of the terminal equipment and at least one of the usage information of the software
  • the attribute information of multiple groups of target controls can be respectively input into the multiple input channels of the pre-trained classifier model, the usage information of the physical device and the usage information of the software.
  • At least one of the input channels may be individually input into a specific channel of the classifier model, and the specific channel may be different from the plurality of input channels for inputting attribute information of the target control.
  • Figures 9-11 are specific process diagrams of obtaining the input image from the front page.
  • the multi-fork tree information corresponding to the foreground page is obtained, each tree is traversed hierarchically, the bottommost leaf node is found, and the attribute information of the corresponding leaf node control is obtained.
  • the attribute information includes the type of the control, the coordinate position and semantic content of the control. It should be noted that for different types of terminal devices, different model training is required due to the different screen sizes and app styles.
  • preprocess the attribute information of the collected controls as shown in Figure 10, to filter out the controls visible in the foreground.
  • only four types of controls are obtained: Button, textView, imageView, and editTextView; the filtered controls are classified according to the type of controls; then the entire screen is divided, for example, it can be divided according to the corresponding resolution, If the resolution of the screen is 1920x1080, the entire screen can be divided into a 192x108 grid matrix; then, for each type of control, use its coordinate information to draw the corresponding screen-based grid matrix.
  • FIG. 12 is a process diagram of converting a layout block diagram of one type of control into a grid matrix. As shown in Figure 12, if the corresponding position of the matrix is covered by this type of control, the value of the corresponding matrix position is 1, otherwise it is 0. If four types of controls are initially set, the page including the four types of controls will get four square matrices after processing.
  • the traditional CNN-based image recognition and classification algorithm uses the color feature information of the picture to represent a picture. Based on the RGB information of the three color elements that make up the image, a picture is divided into three channels for input, that is, R, G, B represents an image in three dimensions, and uses the two-dimensional information of the matrix to represent the position of the corresponding color based on the image.
  • the classifier model of the present application is mainly used to classify page layouts, and control feature information can be used to represent a type of page, and a page can be divided into multiple channels for input based on the control information that composes the page, that is, respectively Various types of controls are used as dimensions to represent a type of page, and the two-dimensional information of the matrix is used to represent the position of the corresponding control based on the page. This can reduce the complexity of processing data and help improve model processing speed and classification accuracy.
  • Fig. 13 is a process diagram of inputting a foreground page into a classifier model for classification.
  • the CNN convolutional neural network is selected for model training.
  • the input is four square matrices, and the middle is the convolution layer, the pooling layer, the fully connected layer, and the parameters such as the number of filters can be set during training.
  • the final output is the classification of the corresponding page, that is, one of the seven categories of communication, shopping, reading, video, games, music, and others.
  • model training a multi-tree page layout classifier model can be obtained for subsequent instance analysis.
  • FIG. 14 is a schematic diagram of a system architecture applied by an embodiment of the present application.
  • the system architecture of the embodiment of the present application mainly includes three parts.
  • the first part includes Activity change listeners and decorView information extractors, located at the Android Framework layer.
  • the activity change listener may be located in the activity manager of FIG. 2 , and is mainly used to monitor the page change situation of the terminal.
  • the decorView information extractor can be located in the window manager of Figure 2, and is mainly used to obtain the decorView information of the current page.
  • the second part is Page Analysis, with Page Analysis as the core, including decorView information screening and classification processing, drawing layout block diagram and model training of CNN neural network.
  • Page Analysis is used to process the decorView information obtained by the Framework layer, filter out the leaf controls visible to the user, and then perform classification processing, convert and map the classified control information, and draw the layout of different types of controls block diagrams, which are then input as parameters of the model in order to arrive at the final classification result.
  • the Page Analysis layer also involves the pre-training of the CNN neural network classifier model, which will be implemented mainly through the CNN convolutional neural network.
  • the third part is Page Classification, including page classification and post-processing of classification results.
  • Page Classification is mainly used to classify pages, combined with some auxiliary perception capabilities, such as the use of microphones, speakers, input methods, etc. It integrates the situation of the perception page, and makes auxiliary judgments on the classification results.
  • Page Analysis and Page Classification can be located in the application layer or the application framework layer in Figure 2.
  • FIG. 15 is a flowchart of another page classification method provided by an embodiment of the present application. As shown in FIG. 15 , the page classification method of the embodiment of the present application includes the following steps:
  • Step S1502 monitor page activity changes. For example, the activity changes of the page are monitored in real time through the Android framework layer.
  • Step S1504 after determining the change, perform page perception based on the latest activity activity page, and obtain the polytree information of the foreground page from the decorView.
  • each foreground page displayed to the user is displayed by processing the layout in the decorView through the window window, so the multi-tree layout information corresponding to the foreground page can be extracted in turn based on the deep search hierarchical traversal method.
  • the left view is the foreground page
  • the right view is the multi-fork tree structure of the foreground page.
  • the detailed information corresponding to each node in the multi-tree can be obtained, such as the control type, the control coordinates, the semantic content of the control, etc., and then combined with the scope of the entire screen, the control displayed in the foreground screen can be obtained. to filter.
  • step S1504 the polytree information is integrated, unnecessary information is eliminated, and only the corresponding layout information (that is, the frame information of the page) is retained, as shown in the corresponding layout block diagram on the right side in Figure 10,
  • the entire page looks similar to the front page view initially seen by the user as the front page view on the left in Figure 9.
  • the corresponding multi-tree hierarchy and semantic content information in the control can also be used, so as to more comprehensively perceive the user's daily scenes and behavior habits.
  • Step S1506 drawing page layout block diagrams of various types of leaf controls, that is, drawing corresponding to different types of leaf node controls (for example, button Button controls, text text controls, picture image controls, edit text editText controls, list list controls, etc.).
  • the control is based on the layout block diagram of the page. As shown in the view on the right in Figure 11, there are four types of control views, namely button button, text view textView, image view imageView and edit text view editTextView. For the entire page layout, each type of control has its own uniqueness. For this obvious feature, it can be used as a feature dimension to subtotal data.
  • each control can be extracted, and a layout map corresponding to each control on the screen can be generated.
  • Step S1508 input the layout block diagrams of all types of controls into the pre-trained classifier model to classify the foreground page, as shown in FIG. 13 .
  • the classifier model may be a CNN convolutional neural network.
  • the page classification method of the embodiment of the present application is performed based on the classifier model, and the polytree information of the foreground page can be obtained through the framework layer, the leaf control information visible on the foreground page is extracted from the polytree information, and different types are drawn according to the leaf control information.
  • the layout block diagram corresponding to the control, the layout block diagram corresponding to different types of controls is used as the multi-channel input pre-trained classifier model, so as to realize the real-time multi-classification of the page.
  • the type of the converted page is classified and counted in real time, and further intelligent suggestion services can be provided according to different classification results and aggregated statistical results.
  • the multi-tree information of the corresponding page is obtained from the framework layer, and after preprocessing the data, the page classification result is obtained through the multi-tree page layout classifier model.
  • information such as the length of time the user stays on the page is recorded.
  • Users can set up daily usage time statistics for the seven categories, and accumulate the page stay time records into the time duration statistics of the corresponding categories.
  • Business use of the seven categories of usage time statistics For example, a bar graph showing mobile phone usage in real time.
  • set a reminder rule a card reminder will pop up when the rule threshold is reached.
  • FIG. 16 is a statistical diagram of a classification operation duration according to an embodiment of the present application. As shown in Figure 16, the user's daily operation behavior time on seven categories is counted, so that the user can clearly see the operation behavior of the user using the mobile phone.
  • FIG. 17 is a diagram of a reminder for healthy use of a mobile phone according to an embodiment of the present application. As shown in Figure 17, the usage time of different operations of the user is analyzed, and relevant content or reminders corresponding to habitual operations are pushed at a specific time; further, intelligent advice services can be provided to users from the perspective of health, such as long-term Use mobile phones to read articles or news, pop up cards to remind users to take a break or drop eye drops to protect their eyesight, etc.
  • the above-mentioned page classification method of the embodiment of the present application no longer relies on the type of App used by the user to perform statistics and classification of user behavior, but more accurately perceives the user's mobile phone usage through the page layout, which can more accurately summarize the user's mobile phone usage.
  • features to provide users with better services For example, it enables users to understand their use of mobile phones at a glance, so that users can know at a glance how long they spend on shopping, reading, videos, etc. every day, so as to help users better arrange and utilize their time.
  • health reminders are provided to users at appropriate times to prevent users from having health problems due to their addiction to mobile phones.
  • the present invention only uses the control information to classify the pages, which greatly reduces the power consumption of the mobile phone compared with the image recognition, and can better be applied to the product and serve the user.
  • FIG. 18 is a schematic structural diagram of a page classification apparatus according to an embodiment of the present application.
  • the page classification method device includes a detection module 1801 , an acquisition module 1802 and a classification module 1803 .
  • the detection module 1801 is configured to detect the switching of the foreground page of the terminal device, wherein the switching of the foreground page is triggered by a user operation.
  • the obtaining module 1802 is configured to obtain the attribute information of the target control of the switched foreground page, wherein the target control at least includes a visible control, and the visible control is a user-visible control.
  • the attribute information includes the type and coordinate position of the target control.
  • the classification module 1803 is used to classify the foreground page according to the type and coordinate position of the target control.
  • the type of target control may include at least one of a text control, an image control, an edit text control, and a list control.
  • the types of front pages may include communication, shopping, reading, video, game, music and others.
  • the CPU in the processor 110 in the aforementioned FIG. 1 can implement the functions of the detection module 1801 and the acquisition module 1802, and the function of the classification module 1803 can be implemented by the CPU, or integrated in the processor.
  • the CPU and NPU in 110 are implemented together.
  • the CPU can be used to divide the target controls into multiple groups according to types and generate a layout block diagram according to the attribute information of the target controls.
  • the NPU can be used for the training and application of the classifier model.
  • the obtaining module 1802 can also be used to obtain auxiliary information related to the switched foreground page, and the auxiliary information includes semantic information of the target control, usage information of the physical device of the terminal device, and usage information of the software of the terminal device. At least one, wherein the physical device includes at least one of a microphone, a speaker, and a camera, and the software includes an input method.
  • the classification module 1803 is used to classify the foreground page according to the type and coordinate position of the target control and auxiliary information.
  • the classification module 1803 may be specifically configured to generate a layout block diagram of the foreground page based on the type and coordinate position of the target control, and to classify the foreground page according to the layout block diagram.
  • the target controls of the current foreground page include multiple types, and the classification module 1803 can be specifically configured to divide the target controls into multiple groups according to types, and each group includes one or more than two types of target controls. According to the type and coordinate position, a plurality of layout block diagrams are generated respectively, and then the foreground pages are classified according to the plurality of layout block diagrams.
  • the target controls of the current stage page include multiple types
  • the classification module 1803 may be specifically configured to divide the target controls into multiple groups according to the types, each group including one or more types of target controls, and then classify the multiple groups of target controls
  • the attribute information of the controls is respectively input into the multiple input channels of the pre-trained classifier model, wherein the attribute information of multiple groups of target controls corresponds to the multiple input channels one-to-one. Classification.
  • the classification module 1803 can also be specifically configured to input the attribute information of each group of target controls into the channel of the pre-trained classifier model in the form of data. Alternatively, the classification module 1803 can also be specifically configured to generate a layout block diagram according to the coordinate position of the attribute information of each group of target controls.
  • the input unit 332 is configured to input the type of each group of target controls and the layout block diagram representing the coordinate position into the channel of the pre-trained classifier model.
  • the obtaining module 1802 is further configured to obtain auxiliary information related to the switched foreground page, and the auxiliary information includes semantic information of the target control, usage information of the physical device of the terminal device, and usage information of the software of the terminal device.
  • the physical device includes at least one of a microphone, a speaker, and a camera
  • the software includes an input method.
  • the classification module 1803 may also be specifically configured to input attribute information and auxiliary information of multiple groups of target controls into multiple input channels of the pre-trained classifier model, respectively.
  • the obtaining module 1802 can be specifically used to obtain the layout information of the decorView of the switched foreground page, the layout information is a multi-fork tree structure, and then, from the layout information of the decorView, obtain the attribute information of the leaf node controls of the multi-fork tree structure, the leaf node
  • the controls include visible controls and invisible controls on the foreground page, wherein the leaf node controls are the last Nth layer of the polytree structure, and N is greater than or equal to 1. Then, the obtaining module 1802 may filter the leaf node controls to obtain attribute information of the visible controls on the foreground page.
  • FIG. 19 is a schematic structural diagram of a terminal device according to an embodiment of the application.
  • the terminal device 1900 includes a processor 1901 and a memory 1902 .
  • Memory 1902 is used to store computer programs.
  • the processor 1901 is configured to execute the above-mentioned page classification method when invoking the computer program.
  • the terminal device may further include a bus 1903 , a microphone 1904 , a speaker 1905 , a display 1906 and a camera 1907 .
  • the processor 1901, the memory 1902, the microphone 1904, the speaker 1905, the display 1906 and the camera 1907 communicate through the bus 1903, and can also communicate through other means such as wireless transmission.
  • processor in the embodiments of the present application may be a central processing unit (central processing unit, CPU), and may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application-specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof.
  • CPU central processing unit
  • DSP digital signal processors
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor or any conventional processor.
  • the method steps in the embodiments of the present application may be implemented in a hardware manner, or may be implemented in a manner in which a processor executes software instructions.
  • Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (programmable rom) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), registers, hard disks, removable hard disks, CD-ROMs or known in the art in any other form of storage medium.
  • An exemplary storage medium is coupled to the processor, such that the processor can read information from, and write information to, the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and storage medium may reside in an ASIC.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted over a computer-readable storage medium.
  • the computer instructions can be sent from one website site, computer, server, or data center to another website site by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.) , computer, server or data center.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state disks (SSDs)), and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephone Function (AREA)

Abstract

一种页面分类方法、页面分类装置和终端设备,涉及人工智能领域,尤其涉及分类技术。该方法包括:检测到终端设备的前台页面切换(S502),其中,所述前台页面的切换由用户操作触发;获取切换后的所述前台页面的目标控件的属性信息,所述属性信息包括目标控件的类型和坐标位置(S504),其中,所述目标控件至少包括可见控件;根据所述目标控件的类型和坐标位置对所述前台页面进行分类(S506)。其根据页面的控件类型和坐标位置呈现的布局信息对页面进行分类,能够准确识别App的使用场景,对该使用场景的页面进行精准分类,从而更加全面地感知用户行为习惯,更好地为用户提供智能化建议服务。

Description

一种页面分类方法、页面分类装置和终端设备
本申请要求于2021年01月29日提交中国国家知识产权局、申请号为202110130728.6、申请名称为“一种页面分类方法、页面分类装置和终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能(Artificial Intelligence,AI)领域的分类技术,具体涉及一种页面分类方法、页面分类装置和终端设备。
背景技术
随着科技的飞速发展,手机已然成为人们生活中必不可少的工具。早晨起床第一件事情是打开手机,查看是否有新消息;晚上睡觉前最后一件事情也是玩手机;吃饭的时候、等车的时候、无聊的时候人们都会选择拿出手机进行玩耍。实际上,手机提供给人们娱乐的同时,也在消耗人们大量的时间。因此很多防沉迷类型的功能出现,帮助用户统计他们在每个应用软件(Application,App)上花费的时间,展示各App自动分类生成的分类模块使用的时长结果。有些App甚至开设了按照约定的使用时长,超时不让进入App或手机功能失效的功能,以帮助用户摆脱沉迷手机的困扰,让用户享受更健康的数字生活。但目前对使用的App进行分类的方法不准确,无法精准感知用户行为。
发明内容
本申请实施例提供的页面分类方法、页面分类装置和终端设备,能够准确识别App的使用场景,对该使用场景的页面进行精准分类,从而更加全面地感知用户行为习惯,更好地为用户提供智能化建议服务。
第一方面,本申请实施例提供了一种页面分类方法,所述页面分类方法包括:检测到终端设备的前台页面切换,其中,所述前台页面的切换由用户操作触发;获取切换后的所述前台页面的目标控件的属性信息,其中,所述目标控件至少包括可见控件,所述属性信息包括目标控件的类型和坐标位置;根据所述目标控件的类型和坐标位置对所述前台页面进行分类。
也就是说,本申请实施例的页面分类方法不是根据App类型进行分类,而是根据页面的控件类型和坐标位置呈现出的布局信息对页面进行分类,页面可为网络页面或app的界面,从而能够准确识别使用场景,对该使用场景的页面进行精准分类,更加全面地感知用户行为习惯,更好地为用户提供智能化建议服务。
在一种可能的实现方式中,所述根据所述目标控件的类型和坐标位置对所述前台页面进行分类,包括:基于所述目标控件的类型和坐标位置生成所述前台页面的布局框图;根据所述布局框图对所述前台页面进行分类。
也就是说,在该实现方式中,可将前台页面转化为布局框图,在该布局框图中用矩形框表示前台页面的目标控件所在位置,由于相同类型的页面具有类似的布局结构,故可基于该布局框图对前台页面进行分类。
在一种可能的实现方式中,所述前台页面的目标控件包括多种类型,所述根据所述目标控件的类型和坐标位置对所述前台页面进行分类,包括:将所述目标控件按照类型划分为多组,每组包括一种或两种以上类型的目标控件;基于多组所述目标控件的类型和坐标位置分别生成多个布局框图;根据所述多个布局框图对所述前台页面进行分类。
也就是说,在该实现方式中,当前台页面的目标控件包括多种类型时,可先按照类型将目标控件分为多组,然后再将每组的目标控件根据坐标位置生成布局框图,这样可通过将每组目标控件生成的多个布局框图与已知类型的页面按照控件类型生成的多个布局框图进行对比,从而获知前台页面的类型。
在一种可能的实现方式中,所述页面分类方法还包括:获取与切换后的所述前台页面相关的辅助信息,所述辅助信息包括所述目标控件的语义信息、所述终端设备的物理器件的使用情况信息和所述终端设备的软件的使用情况信息中的至少一者,其中,所述物理器件包括麦克风、扬声器和摄像头中的至少一者,所述软件包括输入法;所述根据所述目标控件的类型和坐标位置对所述前台页面进行分类包括:根据所述目标控件的类型和坐标位置以及所述辅助信息对所述前台页面进行分类。
也就是说,在该实现方式中,除了根据前台页面的目标控件的类型和坐标位置对前台页面进行分类外,还可借助一些辅助信息对前台页面进行分类。辅助信息可为目标控件的语义信息。若通过前台页面的目标控件的类型和坐标位置判断前台页面可能为通讯类和购物类,当语义信息例如为“你吃饭了吗?”,则可判断该前台页面为通讯类。当语义信息例如为“价格是多少呢?”,则可判断该前台页面为购物类。辅助信息还可为物理器件的使用情况,例如,当麦克风和扬声器等物理器件处于使用状态中时,表示正在通话,该页面为通讯类。辅助信息还可为软件的使用情况,软件可为输入法,当输入法处于使用状态中时,表示正在聊天,该页面为通讯类。
在一种可能的实现方式中,所述前台页面的目标控件包括多种类型,所述根据所述目标控件的类型和坐标位置对所述前台页面进行分类,包括:将所述目标控件按照类型划分为多组,每组包括一种或两种以上类型的目标控件;将多组所述目标控件的属性信息分别输入预先训练的分类器模型的多个输入通道内,其中,多组所述目标控件的属性信息与所述多个输入通道一一对应;使用所述预先训练的分类器模型对所述前台页面进行分类。
也就是说,在该实现方式中,可将目标控件按照类型划分为多组,再将多组目标控件的属性信息输入分类器模型的多个输入通道内,这样每个通道处理一组目标控件的属性信息,有助于降低分类器模型处理数据的复杂程度,提高分类器模型的分类准确率。
在一种可能的实现方式中,所述将多组所述目标控件的属性信息分别输入预先训练的分类器模型的多个输入通道内,包括:将每组所述目标控件的属性信息按照数据形式输入预先训练的分类器模型的通道内;或,按照每组所述目标控件的属性信息的 坐标位置绘制布局框图;将每组所述目标控件的的类型和代表所述坐标位置的所述布局框图输入预先训练的分类器模型的通道内。
也就是说,在该实现方式中,分组后的目标控件的属性信息可以按照数据信息输入预先训练的模型的通道内,也可先按照坐标位置绘制出每组目标控件的布局框图,再将布局框图输入预先训练的分类器模型的通道内。
在一种可能的实现方式中,所述的页面分类方法还包括:获取与切换后的所述前台页面相关的辅助信息,所述辅助信息包括所述目标控件的语义信息、所述终端设备的物理器件的使用情况信息和所述终端设备的软件的使用情况信息中的至少一者,其中,所述物理器件包括麦克风、扬声器和摄像头中的至少一者,所述软件包括输入法;所述将多组所述目标控件的属性信息分别输入预先训练的分类器模型的多个输入通道内包括:将多组目标控件的属性信息和所述辅助信息分别输入预先训练的分类器模型的多个输入通道内。
也就是说,在该实现方式中,不仅可将目标控件的类型和坐标位置输入分类器模型,还可将辅助信息输入分类器模型,从而提高分类器模型的输出结果的准确率。具体地,当辅助信息包括目标控件的语义信息时,可将多组目标控件的属性信息和语义信息分别输入预先训练的分类器模型的多个输入通道内;当辅助信息包括终端设备的物理器件和软件的使用情况信息中的至少一者,可将多组目标控件的属性信息分别输入预先训练的分类器模型的多个输入通道内,物理器件的使用情况信息和软件的使用情况信息中的至少一者可输入分类器模型的特定通道内,该特定通道可与输入目标控件的属性信息和语义信息的多个输入通道不同。
在一种可能的实现方式中,所述目标控件的类型包括按钮控件、文本控件、图像控件和编辑文本控件中的至少一者。例如,目标控件的类型可仅包括文本控件,或者包括文本控件和图像控件。
在一种可能的实现方式中,所述前台页面的类型包括通讯类、购物类、阅读类、视频类、游戏类、音乐类和其他类。其中,“其他类”是指除通讯类、购物类、阅读类、视频类、游戏类、音乐类这六类以外的其他类别。
在一种可能的实现方式中,所述获取切换后的所述前台页面的目标控件的属性信息,包括:获取切换后的所述前台页面的decorView的布局信息,所述布局信息为多叉树结构;从所述decorView的布局信息中获取所述多叉树结构的叶子节点控件的属性信息,所述叶子节点控件包括所述前台页面的可见控件和不可见控件,其中,所述叶子节点控件为所述多叉树结构的倒数第N层,N大于或等于1。
也就是说,在该实现方式中,可借助decorView中的多叉树结构来获得控件的属性信息,即控件类型和坐标位置,以便对页面进行准确分类,更加全面地感知用户行为习惯,更好地为用户提供智能化建议服务。由于仅需获取用户可见的叶子节点控件信息,在实际操作中可以减少功耗,提高分类器模型的训练效率。
在一种可能的实现方式中,所述获取切换后的所述前台页面的目标控件的属性信息,还包括:对所述叶子节点控件进行筛选,以获取所述前台页面的可见控件的属性信息。
也就是说,在该实现方式中,由于多叉树结构的叶子节点控件包括可见控件和不 可见控件,而用户一般不会操作不可见控件,因此可仅筛选可见控件的属性信息,从而可以更加精准的感知用户的操作行为。
第二方面,本申请实施例提供一种页面分类装置,所述页面分类方法装置包括:检测模块,用于检测到终端设备的前台页面切换,其中,所述前台页面的切换由用户操作触发;获取模块,用于获取切换后的所述前台页面的目标控件的属性信息,其中,所述目标控件至少包括可见控件,所述属性信息包括目标控件的类型和坐标位置;分类模块,用于根据所述目标控件的类型和坐标位置对所述前台页面进行分类。
在一种可能的实现方式中,所述分类模块具体用于:基于所述目标控件的类型和坐标位置生成所述前台页面的布局框图;根据所述布局框图对所述前台页面进行分类。
在一种可能的实现方式中,所述前台页面的目标控件包括多种类型,所述分类模块具体用于:将所述目标控件按照类型划分为多组,每组包括一种或两种以上类型的目标控件;基于多组所述目标控件的类型和坐标位置分别生成多个布局框图;根据所述多个布局框图对所述前台页面进行分类。
在一种可能的实现方式中,所述获取模块,还用于获取与切换后的所述前台页面相关的辅助信息,所述辅助信息包括所述目标控件的语义信息、所述终端设备的物理器件的使用情况信息和所述终端设备的软件的使用情况信息中的至少一者,其中,所述物理器件包括麦克风、扬声器和摄像头中的至少一者,所述软件包括输入法;所述分类模块用于根据所述目标控件的类型和坐标位置以及所述辅助信息对所述前台页面进行分类。
在一种可能的实现方式中,所述前台页面的目标控件包括多种类型,所述分类模块具体用于:将所述目标控件按照类型划分为多组,每组包括一种或两种以上类型的目标控件;将多组所述目标控件的属性信息分别输入预先训练的分类器模型的多个输入通道内,其中,多组所述目标控件的属性信息与所述多个输入通道一一对应;使用所述预先训练的分类器模型对所述前台页面进行分类。
在一种可能的实现方式中,所述分类模块还具体用于:将每组所述目标控件的属性信息按照数据形式输入预先训练的分类器模型的通道内;或,按照每组所述目标控件的属性信息的坐标位置生成布局框图;将每组所述目标控件的的类型和代表所述坐标位置的所述布局框图输入预先训练的分类器模型的通道内。
在一种可能的实现方式中,所述获取模块,还用于获取与切换后的所述前台页面相关的辅助信息,所述辅助信息包括所述目标控件的语义信息、所述终端设备的物理器件的使用情况信息和所述终端设备的软件的使用情况信息中的至少一者,其中,所述物理器件包括麦克风、扬声器和摄像头中的至少一者,所述软件包括输入法;所述分类模块还具体用于将多组目标控件的属性信息和所述辅助信息分别输入预先训练的分类器模型的多个输入通道内。
在一种可能的实现方式中,所述目标控件的类型包括按钮控件、文本控件、图像控件和编辑文本控件中的至少一者。
在一种可能的实现方式中,所述前台页面的类型包括通讯类、购物类、阅读类、视频类、游戏类、音乐类和其他类。
在一种可能的实现方式中,所述获取模块具体用于:获取切换后的所述前台页面 的decorView的布局信息,所述布局信息为多叉树结构;从所述decorView的布局信息中获取所述多叉树结构的叶子节点控件的属性信息,所述叶子节点控件包括所述前台页面的可见控件和不可见控件,其中,所述叶子节点控件为所述多叉树结构的倒数第N层,N大于或等于1。
在一种可能的实现方式中,所述获取模块还具体用于:对所述叶子节点控件进行筛选,以获取所述前台页面的可见控件的属性信息。
第三方面,本申请实施例提供了一种终端设备,所述终端设备包括存储器和处理器,所述存储器用于存储计算机程序;所述处理器用于在调用所述计算机程序时执行上述第一方面或第一方面任一种可能实现方式中的方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,用于存储计算机程序,当所述计算机程序被终端设备的处理器执行时,使得所述终端设备实现上述第一方面或第一方面任一种可能实现方式中的方法。
第五方面,本申请实施例提供了一种计算机程序产品,所述计算机程序产品包括计算机程序/指令,当所述计算机程序/指令在终端设备上运行时,使得所述终端设备实现上述第一方面或第一方面任一种可能实现方式中的方法。
本申请实施例的页面分类方法和页面分类装置,不是根据App类型进行分类,而是实时根据页面的控件的类型和坐标位置呈现的布局结构对页面进行分类,可将页面的布局结构输入CNN神经网络进行模型训练,即可应用训练好的分类器模型对用户的操作行为进行分类,能够准确识别App的使用场景,对该使用场景的页面进行精准分类,从而更加全面地感知用户行为习惯,更好地为用户提供智能化建议服务。相比于传统的基于图片的CNN识别算法,本申请实施例的方案仅需获取用户可见的叶子节点控件信息,在实际操作中可以减少功耗,提高模型训练效率。
附图说明
图1为一种手机的硬件结构示意图;
图2为图1的手机所采用的软件系统的结构示意图;
图3-1至图3-6为六种类型的页面的示例图;
图4为终端设备的页面的结构原理图;
图5为本申请实施例提供的一种页面分类方法的流程图;
图6是图5中的步骤S506的一种具体流程图;
图7是图5中的步骤S506的另一种具体流程图;
图8是图5中的步骤S506的又一种具体流程图;
图9-图11为由前台页面获得输入图像的具体过程图;
图12为将一种类型的控件的布局框图转化为方格矩阵的过程图;
图13为将前台页面输入分类器模型进行分类的过程图;
图14为本申请实施例应用的系统架构的示意图;
图15为本申请实施例提供的另一种页面分类方法的流程图;
图16为本申请实施例的分类操作时长统计图;
图17为本申请实施例的健康使用手机提醒图;
图18为本申请实施例提供的一种页面分类装置的结构示意图;
图19为本申请实施例提供的一种终端设备的结构示意图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。显然,所描述的实施例仅是本说明书一部分实施例,而不是全部的实施例。
在本说明书的描述中“一个实施例”或“一些实施例”等意味着在本说明书的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。
其中,在本说明书的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,在本申请实施例的描述中,“多个”是指两个或多于两个。
在本说明书的描述中,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
图1为一种手机的硬件结构示意图。如图1所示,手机100可以包括处理器110、外部存储器接口120、内部存储器121、通用串行总线(universal serial bus,USB)接口130、充电管理模块140、电源管理模块141、电池142、天线1、天线2、射频模块150、通信模块160、音频模块170、扬声器170A、受话器170B、麦克风170C、耳机接口170D、传感器模块180、按键190、马达191、指示器192、摄像头193、屏幕194、以及用户标识模块(subscriber identification module,SIM)卡接口195等。
可以理解的是,本申请实施例示意的结构并不构成对手机100的具体限定。在本申请另一些实施例中,手机100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
例如,本申请实施例中的终端设备可包括处理器110、通信模块160、音频模块170、扬声器170A、受话器170B、麦克风170C、摄像头193以及屏幕194等。传感器模块180可包括压力传感器180A和触摸传感器180K等,可用于检测用户的按压和触摸操作来进行相应动作,例如切换页面。处理器110可以运行本申请实施例提供的页面分类方法,实现根据页面的控件类型和坐标位置呈现的布局信息对页面进行分类,以便准确识别App的使用场景,对该使用场景的页面进行精准分类,从而更加全面地感知用户行为习惯,更好地为用户提供智能化建议服务。处理器110可以包括不同的器件,比如集成CPU和NPU(AI芯片)时,CPU和NPU可以配合执行申请实施例的页面分类方法,比如检测前台页面切换和获取切换后的前台页面的目标控件的属性信 息等由CPU执行,例如分类器模型训练及应用等由NPU执行,以得到较快的处理效率。
当处理器110运行本申请实施例的页面分类方法后,终端设备可以控制屏幕194响应用户操作来切换前台页面(即用户可见页面),并显示该前台页面的分类结果。进一步地,屏幕194还可显示基于本申请实施例的页面分类方法的分类统计结果如图16所示,以及根据统计结果从健康角度对用户提供的智能化建议服务,如长时间使用手机进行文章或新闻阅读,弹出卡片提醒用户休息一下或者滴眼药水保护视力等,如图17所示。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP)、调制解调处理器110、图形处理器110(graphics processing unit,GPU)、图像信号处理器110(image signal processor,ISP)、控制器、存储器、视频编解码器、数字信号处理器110(digital signal processor,DSP)、基带处理器110和/或神经网络处理器110(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器110中。
其中,控制器可以是手机100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。
电源管理模块141用于连接电池142、充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110、内部存储器121、外部存储器、屏幕194、摄像头193、通信模块160等供电。电源管理模块141还可以用于监测电池容量、电池循环次数、电池健康状态(漏电,阻抗)等参数。手机100的无线通信功能可以通过天线1、天线2、射频模块150、通信模块160、调制解调处理器110以及基带处理器110等实现。
天线1和天线2用于发射和接收电磁波信号。手机100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。射频模块150可以提供应用在手机100上的包括2G/3G/4G/5G等无线通信的解决方案。射频模块150可以包括至少一个滤波器、开关、功率放大器、低噪声放大器(low noise amplifier,LNA)等。射频模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波、放大等处理,传送至调制解调处理器110进行解调。射频模块150还可以对经调制解调处理器110调制后的信号放大,经天线1转为电磁波辐射出去。
调制解调处理器110可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。 随后解调器将解调得到的低频基带信号传送至基带处理器110处理。低频基带信号经基带处理器110处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过屏幕194显示图像或视频。通信模块160可以提供应用在手机100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络)、蓝牙(Bluetooth,BT)、全球导航卫星系统(global navigation satellite system,GNSS)、调频(frequency modulation,FM)、近距离无线通信技术(near field communication,NFC)、红外技术(infrared,IR)等无线通信的解决方案。通信模块160可以是集成至少一个通信处理模块的一个或多个器件。通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,手机100的天线1和射频模块150耦合,天线2和通信模块160耦合,使得手机100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM)、通用分组无线服务(general packet radio service,GPRS)、码分多址接入(code division multiple access,CDMA)、宽带码分多址(wideband code division multiple access,WCDMA)、时分码分多址(time-division code division multiple access,TD-SCDMA)、长期演进(long term evolution,LTE)、5G、BT、GNSS、WLAN、NFC、FM、和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS)、全球导航卫星系统(global navigation satellite system,GLONASS)、北斗卫星导航系统(beidou navigation satellite system,BDS)、准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
手机100可以通过ISP、摄像头193、视频编解码器、GPU、屏幕194,以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头193感光元件上,光信号转换为电信号,摄像头193感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点、亮度和肤色进行算法优化。ISP还可以对拍摄场景的曝光、色温等参数优化。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,手机100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器110用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当手机100在频点选择时,数字信号处理器110用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。手机100可以支持一种或多种视频 编解码器。这样,手机100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1、MPEG2、MPEG3、MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。在本申请实施例中,NPU可用于训练分类器模型。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如 电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施 热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
图2为图1的手机所采用的软件系统的结构示意图。如图2所示,通常可以将Android系统分为四层,从上至下依次为应用程序层、应用程序框架层、系统库和安卓运行时(Android runtime)、内核层,每一层都有清晰的角色和分工,层与层之间通过软件接口进行通信。
应用程序层包括部署在手机100上的一系列应用程序。示例性的,应用程序层包括但不限于桌面启动器(Launcher)、设置模块、日历模块、相机模块、通话模块和短信模块。
应用程序框架层可以为应用程序层中的应用程序提供应用编程接口(application programming interface,API)和编程框架,还可以包括一些预先定义的功能模块/服务。示例性的,应用程序框架层中包括但不限于窗口管理器(Window manager)、活动管 理器(Activity manager)、包管理器(Package manager)、资源管理器(Resource manager)和电源管理器(Power manager)。其中,活动管理器用于管理应用程序的生命周期,以及实现各个应用程序的导航回退功能。示例性的,活动管理器可以负责活动(Activity)进程的创建和已经创建的Activity进程的生命周期的维护。窗口管理器用于管理窗口程序。可以理解,应用程序的图形用户界面通常由一个或多个Activity组成,而Activity又由一个或多个视图View组成;窗口管理器可以将需要显示的图形用户界面所包括的View添加到屏幕194上,或者用于从屏幕194上显示的图形用户界面中移除View。
位于应用程序框架层以下的安卓运行时和系统库、内核层等可称为底层系统,底层系统中包括用于提供显示服务的底层显示系统,底层显示系统可以包括但不限于位于系统库的表面管理器(surface manager)以及位于内核层的显示驱动。内核层是硬件和软件之间的层,内核层中包括若干硬件的驱动程序。示例性的,内核层可以包括显示驱动、摄像头驱动、音频驱动以及触控驱动。各个驱动程序可以各自搜集相应的硬件所采集的信息,并向系统库中的状态监测服务或其它功能模块上报相应的监测数据。
随着科技的飞速发展,手机等终端设备已然成为人们生活中必不可少的工具。为了帮助用户统计他们在每个App上花费的时间以及展示各个App自动分类生成的不同类型的分类模块的使用时长结果,有些App开设了按照约定的使用时长的功能,超时不让进入App或手机功能失效,以帮助用户摆脱沉迷手机的困扰,让用户享受更健康的数字生活。
各种防沉迷方案的初衷是很好的,但是真正的应用到人们的生活中就会出现一些问题。例如人们每天会使用很多App,单纯只统计每个App的使用时长,不能让人一眼发现用户的手机使用习惯。并且,对各App进行自动分类统计也存在一些问题。比如系统对App的分类方式并不准确,像浏览器这种综合性的应用,用户可能用浏览器进行购物、观看视频、查阅新闻等等,那么应该把浏览器按照什么类型的应用进行归类呢?再比如,目前各App都在扩展自己的业务范围,不再局限于App创建之初设立的App业务形象,例如短视频App已经不再单纯的只支持发布短视频、观看短视频、点赞、评论等功能,短视频中也增加了聊天的窗口,方便用户的聊天交友需求,例如抖音App既支持短视频也支持聊天。这样如果把短视频App归类为视频类App类型进行时长统计又会造成误差。
另外,基于页面截图识别图片内容的分类方法,主要使用卷积神经网络(convolution neural network,CNN)对图片进行分类。由于图片包含的信息过于丰富,例如包括图形、图像和文本等对于页面图片识别是冗余的信息,影响分类结果的准确率,且使功耗和训练成本增加。
综上所述,如何准确对用户使用手机等终端设备的情况进行统计和分类,以便更好地感知用户行为,绘制更加精准的用户画像,面临巨大挑战。
鉴于此,本申请实施例提供一种页面分类方法、页面分类装置和终端设备,不是根据App类型进行分类,而是实时根据页面的布局信息对页面进行分类,能够准确识别App的使用场景,对该使用场景的页面进行精准分类,从而更加全面地感知用户行为习惯,更好地为用户提供智能化建议服务。具体地,可根据页面布局将页面分成7 大类别,即通讯类、购物类、阅读类、视频类、游戏类、音乐类、其他类,当然也可根据实际需要将页面划分为更多类别或更少类别。同时,还可将页面的布局结构输入CNN神经网络进行模型训练,即可应用训练好的分类器模型对用户的操作行为进行分类。相比于传统的基于图片的CNN识别算法,本申请实施例的方案仅需获取用户可见的叶子节点控件信息,在实际操作中可以减少功耗,提高模型训练效率。另外,需说明的是,本申请实施例的页面分类方法适用于任何有页面的终端设备,包括但不限于手机、平板电脑(PAD)、智慧屏(电视)等日常使用的设备。
图3-1至图3-6为六种类型的页面的示例图。具体地,图3-1为通讯类页面,图3-2为购物类页面,图3-3为阅读类页面,图3-4为视频类页面,图3-5为游戏类页面,图3-6为音乐类页面。目前虽然无法对浏览器App进行分类,并且很多App也已经不再受限于自身最初的业务类型,如视频类app也可进行聊天,但不难发现相同业务场景的页面布局都是惊人的相似的。例如图3-1所示的通讯类页面,通常分为三部分,最上面是导航栏,表明聊天的对象;中间是聊天的主体内容部分,特点在于最左侧和右侧都是头像,以头像为起始点向左或向右增加消息,消息可以是文字也可以是图片等等;最下面是工具栏,提供了切换成语音输入的按钮、输入栏和表情与扩展功能按钮等等。再例如图3-2所示的购物类页面,通常由四部分组成,最上面是导航栏,提供搜索、返回、分享等按钮操作;下面一层是商品展示栏,配置各种商品的图片展示;再下面一层是关于商品的文字介绍;最下边是工具栏,提供客服、收藏、加入购物车、提交购买等按钮操作。基于此,提出了按照业务场景对操作页面进行分类统计的方法,不再受限于App类型和图像内容,能够更准确地感知用户的操作行为习惯。
图4为终端设备的页面的结构原理图。如图4所示,在Android系统中,打开一个应用,实际上会打开一个主Activity,用户可以通过触摸屏幕上不同的控件实现在多个Activity之间来回切换的操作。例如:可从菜单键打开一个菜单的小窗口;又或者点击一个按钮从一个页面跳转到另一个页面。Activity启动过程中实际上是首先初始化PhoneWindow,然后PhoneWindow中的内部类DecorView加载Activity中设置的布局。而WindowManager中的ViewRoot才是真正处理DecorView中的视图绘制以及其他事件的管理类。Window通过WindowManager与WindowManagerService进行交互,最终呈现给用户具体的页面视图。
也就是说,用户看到的页面视图都是处理了decorView中的布局展示出来的,相似的页面视图有着相似的布局结构。当页面的图片、文字不同时,整体基于图片的分类相似度很低,但是从页面的布局结构看,相似度却很高。因此只需提取出页面的布局结构,即可根据页面的布局结构来对前台页面进行分类。
图5为本申请实施例提供的一种页面分类方法的流程图。如图5所示,页面分类方法包括以下步骤:
步骤S502,检测到终端设备的前台页面切换,其中,前台页面的切换由用户操作触发。
步骤S504,获取切换后的前台页面的目标控件的属性信息,其中,目标控件至少包括可见控件,属性信息包括目标控件的类型和坐标位置。目标控件的类型包括按钮控件、文本控件、图像控件和编辑文本控件中的至少一者。例如,包括按钮控件,或 包括按钮控件和文本控件。当然,目标控件的类型还可包括更多种,例如列表控件。具体地,可先获取切换后的前台页面的decorView的布局信息,布局信息为多叉树结构。再从decorView的布局信息中获取多叉树结构的叶子节点控件的属性信息,叶子节点控件包括前台页面的可见控件和不可见控件,其中,叶子节点控件为多叉树结构的倒数第N层,N大于或等于1。
也就是说,在该实现方式中,可借助decorView中的多叉树结构来获得控件的属性信息,从而获得前台页面的控件类型和控件布局,以便对页面进行准确分类,更加全面地感知用户行为习惯,更好地为用户提供智能化建议服务。同时,由于仅需获取用户可见的叶子节点控件信息,在实际操作中可以减少功耗,提高分类器模型的训练效率。
接着,还可对叶子节点控件进行筛选,以获取前台页面的可见控件的属性信息。由于多叉树结构的叶子节点控件包括可见控件和不可见控件,而用户一般不会操作不可见控件,因此可仅筛选可见控件的属性信息,从而可以更加精准的感知用户的操作行为。
步骤S506,根据目标控件的类型和坐标位置对前台页面进行分类。前台页面的类型包括通讯类、购物类、阅读类、视频类、游戏类、音乐类和其他类。其中,“其他类”是指除通讯类、购物类、阅读类、视频类、游戏类、音乐类这六类以外的其他类别。
另外,除了根据目标控件的类型和坐标位置对页面进行分类外,还可结合页面的一些辅助信息进行判断。因此,页面分类方法还可包括以下步骤:
步骤S505,获取与切换后的前台页面相关的辅助信息,辅助信息包括目标控件的语义信息、终端设备的物理器件的使用情况信息和终端设备的软件的使用情况信息中的至少一者,其中,物理器件包括麦克风、扬声器和摄像头中的至少一者,软件包括输入法。
步骤S506’,根据目标控件的类型和坐标位置以及辅助信息对前台页面进行分类。
具体地,当辅助信息可为目标控件的语义信息时,若通过前台页面的目标控件的类型和坐标位置判断前台页面可能为通讯类和购物类,当语义信息例如为“你吃饭了吗?”,则可判断该前台页面为通讯类。当辅助信息为物理器件的使用情况时,例如麦克风和扬声器等物理器件处于使用状态中,表示正在通话状态,则该页面为通讯类。当辅助信息为软件的使用情况时,例如软件可为输入法,当输入法处于使用状态中时,表示正在聊天,该页面为通讯类。
图6是图5中的步骤S506的一种具体流程图。如图6所示,步骤S506可包括以下具体步骤:
步骤S5062,基于目标控件的类型和坐标位置生成前台页面的布局框图。
步骤S5064,根据布局框图对前台页面进行分类。
也就是说,可将前台页面转化为布局框图,在该布局框图中用矩形框表示前台页面的目标控件所在位置,由于相同类型的页面具有类似的布局结构,故可基于该布局框图对前台页面进行分类。
图7是图5中的步骤S506的另一种具体流程图。如图7所示,前台页面的目标控 件包括多种类型,步骤S506可包括以下具体步骤:
步骤S5062’,将目标控件按照类型划分为多组,每组包括一种或两种以上类型的目标控件。
步骤S5064’,基于多组目标控件的类型和坐标位置分别生成多个布局框图.
步骤S5066’,根据多个布局框图对前台页面进行分类。
也就是说,当前台页面的目标控件包括多种类型时,可先按照类型将目标控件分为多组,然后再将每组的目标控件根据坐标位置生成布局框图,这样可通过将每组目标控件生成的多个布局框图与已知类型的页面按照控件类型生成的多个布局框图进行对比,从而获知前台页面的类型。
本申请实施例的页面分类方法不是根据App类型进行分类,而是根据页面的控件类型和布局信息(即坐标位置)对页面进行分类,页面可为网络页面或app的界面,能够准确识别使用场景,对该使用场景的页面进行精准分类,从而更加全面地感知用户行为习惯,更好地为用户提供智能化建议服务。
另外,还可将页面的布局结构输入CNN神经网络进行模型训练,即可应用训练好的分类器模型对用户的操作行为进行分类。
图8是图5中的步骤S506的又一种具体流程图。如图8所示,前台页面的目标控件包括多种类型,步骤S506可包括以下具体步骤:
步骤S5062”,将目标控件按照类型划分为多组,每组包括一种或两种以上类型的目标控件。
步骤S5064”,将多组目标控件的属性信息分别输入预先训练的分类器模型的多个输入通道内,其中,多组目标控件的属性信息与多个输入通道一一对应。
具体地,可将每组目标控件的属性信息按照数据形式输入预先训练的分类器模型的通道内。或者,先按照每组目标控件的属性信息的坐标位置绘制布局框图。再将每组目标控件的的类型和代表坐标位置的布局框图输入预先训练的分类器模型的通道内。
步骤S5066”,使用预先训练的分类器模型对前台页面进行分类。
也就是说,可将目标控件按照类型划分为多组,再将多组目标控件的属性信息输入分类器模型的多个输入通道内,这样每个通道处理一组目标控件的属性信息,有助于降低分类器模型处理数据的复杂程度,提高分类器模型的分类准确率。
另外,在步骤S5066”前,还可先进行步骤S5065”,先获取与切换后的前台页面相关的辅助信息,辅助信息包括目标控件的语义信息、终端设备的物理器件的使用情况信息和终端设备的软件的使用情况信息中的至少一者,其中,物理器件包括麦克风、扬声器和摄像头中的至少一者,软件包括输入法,再将多组目标控件的属性信息和辅助信息分别输入预先训练的分类器模型的多个输入通道内。
也就是说,不仅可将目标控件的类型和坐标位置输入分类器模型,还可将辅助信息输入分类器模型,从而提高分类器模型的输出结果的准确率。具体地,当辅助信息包括目标控件的语义信息时,可将多组目标控件的属性信息和语义信息分别输入预先训练的分类器模型的多个输入通道内;当辅助信息包括终端设备的物理器件和软件的使用情况信息中的至少一者,可将多组目标控件的属性信息分别输入预先训练的分类器模型的多个输入通道内,物理器件的使用情况信息和软件的使用情况信息中的至少 一者可单独输入分类器模型的特定通道内,该特定通道可以与输入目标控件的属性信息的多个输入通道不同。
下面按照模型训练阶段和模型应用阶段对本申请实施例的页面分类方法进行介绍。
一、模型训练阶段
首先,尽可能多地收集各种APP在七大类别(通讯类、购物类、阅读类、视频类、游戏类、音乐类、其他类)上的页面信息,即收集训练数据。
图9-图11为由前台页面获得输入图像的具体过程图。如图9所示,获取前台页面对应的多叉树信息,层次遍历每一棵树,找到最底层的叶子节点,获取对应的叶子节点控件的属性信息。其中,属性信息包括控件的类型、控件的坐标位置和语义内容。需注意的是,针对不同类型的终端设备,由于对应的屏幕尺寸和App的风格模式不同,需要进行不同的模型训练。
其次,对收集到的控件的属性信息进行预处理,如图10所示,筛选出前台可见的控件。如图11所示,仅获取Button、textView、imageView、editTextView四种类型的控件;按照控件的类型对筛选出的控件进行分类;然后把整个屏幕进行分割,例如可以按照对应的分辨率进行分割,如果屏幕为1920x1080的分辨率,则可将整个屏幕分割成192x108的方格矩阵;接着,对于每一种类型的控件利用其坐标信息绘制对应的基于屏幕的方格矩阵。
图12为将一种类型的控件的布局框图转化为方格矩阵的过程图。如图12所示,如果矩阵对应位置被该类型的控件覆盖,对应矩阵位置的值为1,否则为0。若开始设定了四种类型的控件,则包括四种类型的控件的页面处理后会得到四个方格矩阵。
传统的基于CNN的图像识别分类算法利用图片的颜色特征信息来表示一张图片,基于组成图像的颜色三要素RGB信息,将一张图片分为3个通道进行输入,即分别以R、G、B三个维度来代表一张图像,并利用矩阵二维信息来表示对应颜色基于图片的位置。本申请的分类器模型主要用来对页面布局进行分类,可利用控件特征信息来表示一种类型的页面,可基于组成页面的控件信息,将一个页面分为多个通道进行输入,即分别以各种类型的控件作为维度来表示一种类型的页面,并利用矩阵二维信息来表示对应控件基于页面的位置。这样能够降低处理数据的复杂程度,有助于提高模型处理速度和分类准确率。
接着,将处理后的页面输入模型进行训练。图13为将前台页面输入分类器模型进行分类的过程图。如图13所示,选取CNN卷积神经网络进行模型训练,输入为四个方格矩阵,中间为卷积层、池化层、全连接层,滤波器个数等参数设置可在训练的时候进行调优,最后输出为对应页面的分类,即通讯类、购物类、阅读类、视频类、游戏类、音乐类、其他类共七大类别之一。经过模型训练,将可得到多叉树页面布局分类器模型,用于后续实例分析。
图14为本申请实施例应用的系统架构的示意图。如图14所示,本申请实施例的系统架构主要包括三部分。第一部分包括Activity变化监听器和decorView信息提取器,位于安卓框架(Android Framework)层。具体地,Activity变化监听器可位于图2的活动管理器中,主要用于监听终端的页面变化情况。decorView信息提取器可位于图2的窗口管理器中,主要用于获取当前页面的decorView信息。第二部分为页面分 析(Page Analysis),Page Analysis为核心,包括decorView信息筛选与分类处理、绘制布局框图和CNN神经网络的模型训练Model training。具体地,Page Analysis用于对Framework层获取到的decorView信息进行处理,筛选出用户可见的叶子控件,然后进行分类处理,并将分类后的控件信息进行转换与映射,绘制成不同类型控件的布局框图,然后将其作为模型的参数进行输入,以便得出最后的分类结果。同时,Page Analysis层还涉及CNN神经网络分类器模型的提前训练,将主要通过CNN卷积神经网络进行实现。第三部分为页面分类(Page Classification),包括页面分类和分类结果后处理,具体地,Page Classification主要用于进行页面的分类,并结合一些辅助感知能力,如麦克风、扬声器、输入法等的使用情况,融合感知页面的状况,对分类的结果进行辅助判断。其中,Page Analysis和Page Classification可位于图2的应用程序层或应用程序框架层。
二、模型应用阶段
图15为本申请实施例提供的另一种页面分类方法的流程图。如图15所示,本申请实施例的页面分类方法包括以下步骤:
步骤S1502,监听页面activity变化。例如,通过安卓framework层实时监听页面的activity变化。
步骤S1504,确定变化后,基于最新的activity活动页面进行页面感知,从decorView中获取前台页面的多叉树信息。
如前所述,每一个展示给用户的前台页面都是通过窗口window处理了decorView中的布局展示出来的,因此可基于深度搜索层次遍历方法反过来提取出前台页面对应的多叉树布局信息。在图9中,左侧视图为前台页面,右侧视图为该前台页面的多叉树结构。具体地,基于多叉树结构可获取多叉树中每个节点对应的详细信息,如控件类型、控件坐标、控件的语义内容等,再结合整个屏幕的范围,对展示在前台屏幕中的控件进行筛选。其中,父节点实际上会包含子节点,但本申请的方案中不需要考虑重叠关系,只需获取最终呈现给用户的可见控件,因此只需筛选出最底层的叶子节点控件,如图9中右侧和图10中左侧的多叉树结构的最后一层view。
也就是说,在步骤S1504,对多叉树信息进行了整合,剔除不必要的信息,只保留了对应的布局信息(即页面的框架信息),如图10中右侧的对应的布局框图,整个页面看起来与最初用户看见的前台页面如图9中左侧的前台页面视图相似。另外,除了利用可见的叶子节点控件信息,还可利用对应的多叉树层次结构以及控件中的语义内容信息,从而更加全面的感知用户的日常场景与行为习惯。
步骤S1506,绘制各种类型叶子控件的页面布局框图,即对于不同类型的叶子节点控件(例如,按钮Button控件、文本text控件、图片image控件、编辑文本editText控件、列表list控件等等)绘制对应控件基于页面的布局框图。如图11中右侧的视图所示,共包括四种类型控件的视图,即按钮button、文本视图textView、图像视图imageView和编辑文本视图editTextView。对于整个页面布局来说,每一种类型的控件都有其独特性,对于这种明显的特征,可将其作为特征维度,分类汇总数据。利用上一步中获取到的用户可见的叶子控件信息,可实现对每一种控件进行抽离,生成对应每一种控件在屏幕中的版面图。
步骤S1508,将所有类型控件的布局框图输入预先训练的分类器模型以对前台页面进行分类,如图13所示。其中,分类器模型可为CNN卷积神经网络。
本申请实施例的页面分类方法基于分类器模型进行,可通过framework层获取前台页面的多叉树信息,从多叉树信息中提取前台页面可见的叶子控件信息,根据叶子控件信息分别绘制不同类型控件对应的布局框图,将不同类型控件对应的布局框图作为多通道输入预先训练的分类器模型,从而实现页面的实时多分类。
在有页面变换的场景下,实时对变换后的页面的类型进行分类与统计,根据不同的分类结果及汇总的统计结果可提供进一步的智能化建议服务。具体地,当通过framework层监听到页面变化时,从framework层获取对应页面的多叉树信息,对数据进行预处理后,通过多叉树页面布局分类器模型得出页面分类结果。接着,记录用户在该页面的停留时长等信息。用户可对七大类别设立每日的使用时长统计量,将页面停留时长记录累加到对应类别的时长统计量中。对七大类别的使用时长统计量进行业务运用。例如实时展示手机使用情况条形图。再如,设置提醒规则:达到规则阈值则弹出卡片提醒。
图16为本申请实施例的分类操作时长统计图。如图16所示,统计了用户每日在七大类别上的操作行为时长,使用户对其使用手机的操作行为一目了然。图17为本申请实施例的健康使用手机提醒图。如图17所示,对用户不同操作的使用时间进行分析,在特定的时间推送对应习惯操作的相关内容或者提醒;更进一步地,可从健康的角度对用户提供智能化建议服务,如长时间使用手机进行文章或新闻阅读,弹出卡片提醒用户休息一下或者滴眼药水保护视力等。
本申请实施例的上述页面分类方法不再依赖用户使用的App类型对用户行为进行统计和分类,而是更精准地通过页面布局感知用户的手机使用情况,可以更准确地总结出用户使用手机的特征,为用户提供更好的服务。例如,使用户对其使用手机的情况一目了然,让用户一眼就能了解到其每天在购物、阅读、视频等方面的消耗时长,帮助用户更好地安排和利用自己的时间。再例如,在合适的时间对用户进行健康提醒,防止用户因为沉迷手机而产生健康问题。同时,本发明只利用控件信息进行页面分类,相比于图片识别大大降低了手机使用的功耗,相比之下能更好的落地于产品,为用户服务。
图18为本申请实施例提供的一种页面分类装置的结构示意图。如图18所示,页面分类方法装置包括检测模块1801、获取模块1802和分类模块1803。检测模块1801用于检测到终端设备的前台页面切换,其中,前台页面的切换由用户操作触发。获取模块1802用于获取切换后的前台页面的目标控件的属性信息,其中,目标控件至少包括可见控件,可见控件为用户可见的控件。属性信息包括目标控件的类型和坐标位置。分类模块1803用于根据目标控件的类型和坐标位置对前台页面进行分类。目标控件的类型可包括文本控件、图像控件、编辑文本控件和列表控件中的至少一者。前台页面的类型可包括通讯类、购物类、阅读类、视频类、游戏类、音乐类和其他类。
具体地,在本申请实施例中,前述图1中的处理器110中的CPU可实现检测模块1801和获取模块1802的功能,分类模块1803的功能可通过CPU来实现,或者通过集成在处理器110中的CPU和NPU来共同实现,具体地,CPU可用于将目标控件按 照类型划分为多组和根据目标控件的属性信息生成布局框图等,NPU可用于分类器模型的训练和应用。
进一步地,获取模块1802还可用于获取与切换后的前台页面相关的辅助信息,辅助信息包括目标控件的语义信息、终端设备的物理器件的使用情况信息和终端设备的软件的使用情况信息中的至少一者,其中,物理器件包括麦克风、扬声器和摄像头中的至少一者,软件包括输入法。分类模块1803用于根据目标控件的类型和坐标位置以及辅助信息对前台页面进行分类。
分类模块1803可具体用于基于目标控件的类型和坐标位置生成前台页面的布局框图和根据布局框图对前台页面进行分类。
当前台页面的目标控件包括多种类型,分类模块1803可具体用于将目标控件按照类型划分为多组,每组包括一种或两种以上类型的目标控件,接着,基于多组目标控件的类型和坐标位置分别生成多个布局框图,然后,根据多个布局框图对前台页面进行分类。
或者,当前台页面的目标控件包括多种类型,分类模块1803可具体用于将目标控件按照类型划分为多组,每组包括一种或两种以上类型的目标控件,接着,将多组目标控件的属性信息分别输入预先训练的分类器模型的多个输入通道内,其中,多组目标控件的属性信息与多个输入通道一一对应,然后,使用预先训练的分类器模型对前台页面进行分类。
分类模块1803还可具体用于将每组目标控件的属性信息按照数据形式输入预先训练的分类器模型的通道内。或者,分类模块1803还可具体用于按照每组目标控件的属性信息的坐标位置生成布局框图。输入单元332用于将每组目标控件的的类型和代表坐标位置的布局框图输入预先训练的分类器模型的通道内。
进一步地,获取模块1802还用于获取与切换后的前台页面相关的辅助信息,辅助信息包括目标控件的语义信息、终端设备的物理器件的使用情况信息和终端设备的软件的使用情况信息中的至少一者,其中,物理器件包括麦克风、扬声器和摄像头中的至少一者,软件包括输入法。分类模块1803还可具体用于将多组目标控件的属性信息和辅助信息分别输入预先训练的分类器模型的多个输入通道内。
获取模块1802可具体用于获取切换后的前台页面的decorView的布局信息,布局信息为多叉树结构,接着,从decorView的布局信息中获取多叉树结构的叶子节点控件的属性信息,叶子节点控件包括前台页面的可见控件和不可见控件,其中,叶子节点控件为多叉树结构的倒数第N层,N大于或等于1。然后,获取模块1802可对叶子节点控件进行筛选,以获取前台页面的可见控件的属性信息。
图19为本申请实施例提供的一种终端设备的结构示意图。如图19所示,所述终端设备1900包括处理器1901和存储器1902。存储器1902用于存储计算机程序。处理器1901用于在调用所述计算机程序时执行上述的页面分类方法。进一步地,终端设备还可包括总线1903、麦克风1904、扬声器1905、显示器1906和摄像头1907。其中,处理器1901、存储器1902、麦克风1904、扬声器1905、显示器1906和摄像头1907通过总线1903进行通信,也可以通过无线传输等其他手段实现通信。
可以理解的是,本申请的实施例中的处理器可以是中央处理单元(central  processing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器(programmable rom,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。

Claims (24)

  1. 一种页面分类方法,其特征在于,包括:
    检测到终端设备的前台页面切换,其中,所述前台页面的切换由用户操作触发;
    获取切换后的所述前台页面的目标控件的属性信息,其中,所述目标控件至少包括可见控件,所述属性信息包括目标控件的类型和坐标位置;
    根据所述目标控件的类型和坐标位置对所述前台页面进行分类。
  2. 根据权利要求1所述的页面分类方法,其特征在于,所述根据所述目标控件的类型和坐标位置对所述前台页面进行分类,包括:
    基于所述目标控件的类型和坐标位置生成所述前台页面的布局框图;
    根据所述布局框图对所述前台页面进行分类。
  3. 根据权利要求1所述的页面分类方法,其特征在于,所述前台页面的目标控件包括多种类型,所述根据所述目标控件的类型和坐标位置对所述前台页面进行分类,包括:
    将所述目标控件按照类型划分为多组,每组包括一种或两种以上类型的目标控件;
    基于多组所述目标控件的类型和坐标位置分别生成多个布局框图;
    根据所述多个布局框图对所述前台页面进行分类。
  4. 根据权利要求1-3中任一项所述的页面分类方法,其特征在于,所述页面分类方法还包括:获取与切换后的所述前台页面相关的辅助信息,所述辅助信息包括所述目标控件的语义信息、所述终端设备的物理器件的使用情况信息和所述终端设备的软件的使用情况信息中的至少一者,其中,所述物理器件包括麦克风、扬声器和摄像头中的至少一者,所述软件包括输入法;
    所述根据所述目标控件的类型和坐标位置对所述前台页面进行分类包括:根据所述目标控件的类型和坐标位置以及所述辅助信息对所述前台页面进行分类。
  5. 根据权利要求1所述的页面分类方法,其特征在于,所述前台页面的目标控件包括多种类型,所述根据所述目标控件的类型和坐标位置对所述前台页面进行分类,包括:
    将所述目标控件按照类型划分为多组,每组包括一种或两种以上类型的目标控件;
    将多组所述目标控件的属性信息分别输入预先训练的分类器模型的多个输入通道内,其中,多组所述目标控件的属性信息与所述多个输入通道一一对应;
    使用所述预先训练的分类器模型对所述前台页面进行分类。
  6. 根据权利要求5所述的页面分类方法,其特征在于,所述将多组所述目标控件的属性信息分别输入预先训练的分类器模型的多个输入通道内,包括:
    将每组所述目标控件的属性信息按照数据形式输入预先训练的分类器模型的通道内;或,
    按照每组所述目标控件的属性信息的坐标位置绘制布局框图;
    将每组所述目标控件的的类型和代表所述坐标位置的所述布局框图输入预先训练的分类器模型的通道内。
  7. 根据权利要求5或6所述的页面分类方法,其特征在于,所述的页面分类方法还包括:
    获取与切换后的所述前台页面相关的辅助信息,所述辅助信息包括所述目标控件的语义信息、所述终端设备的物理器件的使用情况信息和所述终端设备的软件的使用情况信息中的至少一者,其中,所述物理器件包括麦克风、扬声器和摄像头中的至少一者,所述软件包括输入法;
    所述将多组所述目标控件的属性信息分别输入预先训练的分类器模型的多个输入通道内包括:
    将多组目标控件的属性信息和所述辅助信息分别输入预先训练的分类器模型的多个输入通道内。
  8. 根据权利要求1-7中任一项所述的页面分类方法,其特征在于,所述目标控件的类型包括按钮控件、文本控件、图像控件和编辑文本控件中的至少一者。
  9. 根据权利要求1-8中任一项所述的页面分类方法,其特征在于,所述前台页面的类型包括通讯类、购物类、阅读类、视频类、游戏类、音乐类和其他类。
  10. 根据权利要求1-9中任一项所述的页面分类方法,其特征在于,所述获取切换后的所述前台页面的目标控件的属性信息,包括:
    获取切换后的所述前台页面的decorView的布局信息,所述布局信息为多叉树结构;
    从所述decorView的布局信息中获取所述多叉树结构的叶子节点控件的属性信息,所述叶子节点控件包括所述前台页面的可见控件和不可见控件,其中,所述叶子节点控件为所述多叉树结构的倒数第N层,N大于或等于1。
  11. 根据权利要求10所述的页面分类方法,其特征在于,所述获取切换后的所述前台页面的目标控件的属性信息,还包括:
    对所述叶子节点控件进行筛选,以获取所述前台页面的可见控件的属性信息。
  12. 一种页面分类装置,其特征在于,包括:
    检测模块,用于检测到终端设备的前台页面切换,其中,所述前台页面的切换由用户操作触发;
    获取模块,用于获取切换后的所述前台页面的目标控件的属性信息,其中,所述目标控件至少包括可见控件,所述属性信息包括目标控件的类型和坐标位置;
    分类模块,用于根据所述目标控件的类型和坐标位置对所述前台页面进行分类。
  13. 根据权利要求12所述的页面分类装置,其特征在于,所述分类模块具体用于:
    基于所述目标控件的类型和坐标位置生成所述前台页面的布局框图;
    根据所述布局框图对所述前台页面进行分类。
  14. 根据权利要求12所述的页面分类装置,其特征在于,所述前台页面的目标控件包括多种类型,所述分类模块具体用于:
    将所述目标控件按照类型划分为多组,每组包括一种或两种以上类型的目标控件;
    基于多组所述目标控件的类型和坐标位置分别生成多个布局框图;
    根据所述多个布局框图对所述前台页面进行分类。
  15. 根据权利要求12-14中任一项所述的页面分类装置,其特征在于:
    所述获取模块,还用于获取与切换后的所述前台页面相关的辅助信息,所述辅助信息包括所述目标控件的语义信息、所述终端设备的物理器件的使用情况信息和所述 终端设备的软件的使用情况信息中的至少一者,其中,所述物理器件包括麦克风、扬声器和摄像头中的至少一者,所述软件包括输入法;
    所述分类模块用于根据所述目标控件的类型和坐标位置以及所述辅助信息对所述前台页面进行分类。
  16. 根据权利要求12所述的页面分类装置,其特征在于,所述前台页面的目标控件包括多种类型,所述分类模块具体用于:
    将所述目标控件按照类型划分为多组,每组包括一种或两种以上类型的目标控件;
    将多组所述目标控件的属性信息分别输入预先训练的分类器模型的多个输入通道内,其中,多组所述目标控件的属性信息与所述多个输入通道一一对应;
    使用所述预先训练的分类器模型对所述前台页面进行分类。
  17. 根据权利要求16所述的页面分类装置,其特征在于,所述分类模块还具体用于:
    将每组所述目标控件的属性信息按照数据形式输入预先训练的分类器模型的通道内;或,
    按照每组所述目标控件的属性信息的坐标位置生成布局框图;
    将每组所述目标控件的的类型和代表所述坐标位置的所述布局框图输入预先训练的分类器模型的通道内。
  18. 根据权利要求16或17所述的页面分类装置,其特征在于,所述获取模块,还用于获取与切换后的所述前台页面相关的辅助信息,所述辅助信息包括所述目标控件的语义信息、所述终端设备的物理器件的使用情况信息和所述终端设备的软件的使用情况信息中的至少一者,其中,所述物理器件包括麦克风、扬声器和摄像头中的至少一者,所述软件包括输入法;
    所述分类模块还具体用于将多组目标控件的属性信息和所述辅助信息分别输入预先训练的分类器模型的多个输入通道内。
  19. 根据权利要求12-18中任一项所述的页面分类装置,其特征在于,所述目标控件的类型包括按钮控件、文本控件、图像控件和编辑文本控件中的至少一者。
  20. 根据权利要求12-19中任一项所述的页面分类装置,其特征在于,所述前台页面的类型包括通讯类、购物类、阅读类、视频类、游戏类、音乐类和其他类。
  21. 根据权利要求12-20中任一项所述的页面分类装置,其特征在于,所述获取模块具体用于:
    获取切换后的所述前台页面的decorView的布局信息,所述布局信息为多叉树结构;
    从所述decorView的布局信息中获取所述多叉树结构的叶子节点控件的属性信息,所述叶子节点控件包括所述前台页面的可见控件和不可见控件,其中,所述叶子节点控件为所述多叉树结构的倒数第N层,N大于或等于1。
  22. 根据权利要求21所述的页面分类装置,其特征在于,所述获取模块还具体用于:
    对所述叶子节点控件进行筛选,以获取所述前台页面的可见控件的属性信息。
  23. 一种终端设备,其特征在于,包括存储器和处理器,所述存储器用于存储计算 机程序;所述处理器用于在调用所述计算机程序时执行权利要求1-11中任一项所述的方法。
  24. 一种计算机可读存储介质,其特征在于,用于存储计算机程序,当所述计算机程序被计算机执行时,使得所述终端设备实现权利要求1至11中任一项所述的方法。
PCT/CN2021/136531 2021-01-29 2021-12-08 一种页面分类方法、页面分类装置和终端设备 WO2022160958A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110130728.6 2021-01-29
CN202110130728.6A CN114816610B (zh) 2021-01-29 2021-01-29 一种页面分类方法、页面分类装置和终端设备

Publications (1)

Publication Number Publication Date
WO2022160958A1 true WO2022160958A1 (zh) 2022-08-04

Family

ID=82526132

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/136531 WO2022160958A1 (zh) 2021-01-29 2021-12-08 一种页面分类方法、页面分类装置和终端设备

Country Status (2)

Country Link
CN (1) CN114816610B (zh)
WO (1) WO2022160958A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116185245A (zh) * 2023-04-28 2023-05-30 荣耀终端有限公司 一种页面显示方法及电子设备
CN117217852A (zh) * 2023-08-03 2023-12-12 广州兴趣岛信息科技有限公司 一种基于行为识别购买意愿度预测方法及装置

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115309299B (zh) * 2022-09-14 2024-02-23 Oppo广东移动通信有限公司 桌面卡片的显示方法、装置、终端、存储介质及程序产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150277692A1 (en) * 2013-01-23 2015-10-01 Dongguan Goldex Communication Technology Co., Ltd. Method for moving icon on terminal and terminal
WO2018018294A1 (zh) * 2016-07-24 2018-02-01 张鹏华 一种手机切换应用的方法和切换系统
CN109032734A (zh) * 2018-07-13 2018-12-18 维沃移动通信有限公司 一种后台应用程序显示方法和移动终端
CN109542562A (zh) * 2018-11-09 2019-03-29 浙江口碑网络技术有限公司 界面图片的识别方法及装置
CN112115043A (zh) * 2020-08-12 2020-12-22 浙江大学 一种基于图像的端上智能化页面质量巡检方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150277692A1 (en) * 2013-01-23 2015-10-01 Dongguan Goldex Communication Technology Co., Ltd. Method for moving icon on terminal and terminal
WO2018018294A1 (zh) * 2016-07-24 2018-02-01 张鹏华 一种手机切换应用的方法和切换系统
CN109032734A (zh) * 2018-07-13 2018-12-18 维沃移动通信有限公司 一种后台应用程序显示方法和移动终端
CN109542562A (zh) * 2018-11-09 2019-03-29 浙江口碑网络技术有限公司 界面图片的识别方法及装置
CN112115043A (zh) * 2020-08-12 2020-12-22 浙江大学 一种基于图像的端上智能化页面质量巡检方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116185245A (zh) * 2023-04-28 2023-05-30 荣耀终端有限公司 一种页面显示方法及电子设备
CN116185245B (zh) * 2023-04-28 2023-09-26 荣耀终端有限公司 一种页面显示方法及电子设备
CN117217852A (zh) * 2023-08-03 2023-12-12 广州兴趣岛信息科技有限公司 一种基于行为识别购买意愿度预测方法及装置
CN117217852B (zh) * 2023-08-03 2024-02-27 广州兴趣岛信息科技有限公司 一种基于行为识别购买意愿度预测方法及装置

Also Published As

Publication number Publication date
CN114816610A (zh) 2022-07-29
CN114816610B (zh) 2024-02-02

Similar Documents

Publication Publication Date Title
US11871328B2 (en) Method for identifying specific position on specific route and electronic device
WO2021213120A1 (zh) 投屏方法、装置和电子设备
WO2020211701A1 (zh) 模型训练方法、情绪识别方法及相关装置和设备
WO2020078299A1 (zh) 一种处理视频文件的方法及电子设备
WO2022160958A1 (zh) 一种页面分类方法、页面分类装置和终端设备
EP4064284A1 (en) Voice detection method, prediction model training method, apparatus, device, and medium
WO2021052214A1 (zh) 一种手势交互方法、装置及终端设备
CN110618933B (zh) 性能分析方法与系统、电子设备与存储介质
WO2021244457A1 (zh) 一种视频生成方法及相关装置
WO2020029306A1 (zh) 一种图像拍摄方法及电子设备
US20220150403A1 (en) Input Method and Electronic Device
WO2021180089A1 (zh) 界面切换方法、装置和电子设备
WO2021052139A1 (zh) 手势输入方法及电子设备
US20220343648A1 (en) Image selection method and electronic device
WO2020029094A1 (zh) 一种语音控制命令生成方法及终端
US20230276125A1 (en) Photographing method and electronic device
CN115016869A (zh) 帧率调整方法、终端设备及帧率调整系统
CN114466449A (zh) 一种位置特征获取方法及电子设备
WO2022037479A1 (zh) 一种拍摄方法和拍摄系统
WO2022214004A1 (zh) 一种目标用户确定方法、电子设备和计算机可读存储介质
WO2022166435A1 (zh) 分享图片的方法和电子设备
EP4195073A1 (en) Content recommendation method, electronic device and server
CN115373957A (zh) 杀应用的方法及设备
CN113742460A (zh) 生成虚拟角色的方法及装置
WO2023207799A1 (zh) 消息处理方法和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922548

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922548

Country of ref document: EP

Kind code of ref document: A1