CN113573132B

CN113573132B - Multi-application screen spelling method and device based on voice realization and storage medium

Info

Publication number: CN113573132B
Application number: CN202110841048.5A
Authority: CN
Inventors: 周胜杰; 赵家宇; 李涛
Original assignee: Shenzhen Konka Electronic Technology Co Ltd
Current assignee: Shenzhen Konka Electronic Technology Co Ltd
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2023-08-11
Anticipated expiration: 2041-07-23
Also published as: CN113573132A

Abstract

The invention discloses a multi-application screen splicing method, a device and a storage medium based on voice realization, wherein the method comprises the following steps: acquiring voice text information and determining a plurality of application field categories corresponding to the voice text information; determining a plurality of target application programs according to the voice text information and the application domain categories; and carrying out split screen display on the target application programs. The invention can identify a plurality of application programs which need to be opened in the voice of the user and display the identified plurality of application programs in a screen-splicing way, so that the problem that the traditional voice interaction technology is difficult to be suitable for the screen-splicing scene of the intelligent screen can be effectively solved.

Description

Multi-application screen spelling method and device based on voice realization and storage medium

Technical Field

The invention relates to the field of voice interaction, in particular to a multi-application screen spelling method, device and storage medium based on voice realization.

Background

With the upgrade of the hardware and the system of the intelligent screen, the traditional intelligent television is upgraded into the intelligent screen, and the intelligent screen is not traditional but only one application occupies the whole system, but can simultaneously open a plurality of applications in the system and simultaneously present and interact in a screen window. However, when an application is opened, the voice interaction technology generally has only one dimension, i.e. only a specified application can be opened or a jump can be applied to a specified page. Therefore, the existing voice interaction technology is difficult to be suitable for the screen spelling scene of the intelligent screen, and needs to be further optimized and improved.

Accordingly, there is a need for improvement and development in the art.

Disclosure of Invention

The invention aims to solve the technical problems that aiming at the defects in the prior art, a multi-application screen splicing method, a device and a storage medium based on voice are provided, and the aim is to solve the problem that the existing voice interaction technology is difficult to be applied to a screen splicing scene of an intelligent screen.

The technical scheme adopted by the invention for solving the problems is as follows:

in a first aspect, an embodiment of the present invention provides a multi-application screen-spelling method implemented based on voice, where the method includes:

acquiring voice text information and determining a plurality of application field categories corresponding to the voice text information;

determining a plurality of target application programs according to the voice text information and the application domain categories;

and carrying out split screen display on the target application programs.

In one embodiment, the obtaining the voice text information includes:

acquiring a voice instruction;

and performing text conversion on the voice command to obtain the voice text information.

In one embodiment, the determining a number of target applications according to the phonetic text information and the number of application domain categories includes:

according to the voice text information, matching each application domain category in the application domain categories to obtain an application name corresponding to each application domain category;

and determining the plurality of target application programs according to the application names corresponding to the application domain categories.

taking the application field category which is not successfully matched in the application field categories as a fuzzy application field;

acquiring the quantity and the historical operation data of the fuzzy application fields;

and determining a target application program corresponding to the fuzzy application field according to the quantity of the fuzzy application fields and the historical operation data.

In one embodiment, the determining, according to the number of the fuzzy application areas and the historical operating data, the target application program corresponding to the fuzzy application areas includes:

and when the number of the fuzzy application fields is 1, determining the application program with the largest opening times in the application fields corresponding to the fuzzy application fields according to the historical operation data, and obtaining the target application program corresponding to the fuzzy application fields.

when the number of the fuzzy application fields is greater than 1, determining the class priority corresponding to each fuzzy application field;

and determining a target application program corresponding to each fuzzy application field according to the class priority and the historical operation data.

In one embodiment, the determining the target application program corresponding to each fuzzy application field according to the category priority and the historical operating data includes:

when the class priority corresponding to the fuzzy application field is the highest priority, the fuzzy application field is used as first data;

determining an application program with the largest opening times in the application field corresponding to the first data according to the historical operation data to obtain a target application program corresponding to the first data;

when the class priority corresponding to the fuzzy application field is not the highest priority, the fuzzy application field is used as second data;

and determining the application program with the largest opening times of the target application program corresponding to the first data in the application field corresponding to the second data according to the historical operation data, and obtaining the target application program corresponding to the second data.

In one embodiment, the performing the split screen display on the target application programs includes:

acquiring a screen spelling template, and determining a plurality of windows according to the screen spelling template, wherein the windows correspond to the target application programs one by one;

and opening the target application programs according to the windows.

In a second aspect, an embodiment of the present invention further provides a multi-application screen-spelling device implemented based on voice, where the device includes:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring voice text information and determining a plurality of application field categories corresponding to the voice text information;

the determining module is used for determining a plurality of target application programs according to the voice text information and the application domain categories;

and the screen splicing module is used for carrying out screen splicing display on the plurality of target application programs.

In a third aspect, an embodiment of the present invention further provides a computer readable storage medium having stored thereon a plurality of instructions, where the instructions are adapted to be loaded and executed by a processor to implement the steps of any of the above-described speech-based implemented multi-application screen spelling methods.

The invention has the beneficial effects that: according to the embodiment of the invention, the voice text information is acquired, and a plurality of application field categories corresponding to the voice text information are determined; determining a plurality of target application programs according to the voice text information and the application domain categories; and carrying out split screen display on the target application programs. The invention can identify a plurality of application programs which need to be opened in the voice of the user and display the identified plurality of application programs in a screen-splicing way, so that the problem that the traditional voice interaction technology is difficult to be suitable for the screen-splicing scene of the intelligent screen can be effectively solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.

Fig. 1 is a simple flow diagram of a multi-application screen-spelling method based on voice implementation according to an embodiment of the present invention.

Fig. 2 is a detailed flowchart of a multi-application screen-spelling method based on voice implementation according to an embodiment of the present invention.

Fig. 3 is a connection diagram of internal modules of a voice-based multi-application screen-spelling device according to an embodiment of the present invention.

Fig. 4 is a schematic block diagram of a terminal according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear and clear, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

It should be noted that, if directional indications (such as up, down, left, right, front, and rear … …) are included in the embodiments of the present invention, the directional indications are merely used to explain the relative positional relationship, movement conditions, etc. between the components in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are correspondingly changed.

In order to overcome the defects in the prior art, the invention provides a multi-application screen splicing method based on voice implementation, which comprises the following steps: acquiring voice text information and determining a plurality of application field categories corresponding to the voice text information; determining a plurality of target application programs according to the voice text information and the application domain categories; and carrying out split screen display on the target application programs. The invention can identify a plurality of application programs which need to be opened in the voice of the user and display the identified plurality of application programs in a screen-splicing way, so that the problem that the traditional voice interaction technology is difficult to be suitable for the screen-splicing scene of the intelligent screen can be effectively solved.

As shown in fig. 1, the method comprises the steps of:

step S100, acquiring voice text information and determining a plurality of application field categories corresponding to the voice text information.

Specifically, since the objective of the present embodiment is to implement multi-application screen sharing based on voice, the user voice may reflect several applications that need to be opened in the present embodiment. When the intelligent screen acquires the voice text information generated based on the voice of the user, a plurality of application field categories contained in the voice text information can be judged, and application programs in each application field category are not overlapped with each other, namely, one application program can only belong to one application field category.

For example, when the voice text information is "watch movie and play game", the smart screen can determine two application categories of "video application", "game application", based on the voice text information. Among these, the video application fields may include the following applications: tencel video, mango TV, aiqi art, etc. And the game application field can comprise the following application programs: peace elite, happy, etc.

In one implementation, the acquiring the voice text information includes the steps of:

step S101, obtaining a voice instruction;

step S102, performing text conversion on the voice command to obtain the voice text information.

Specifically, in this embodiment, one or more voice sensors are provided on the smart screen in advance, and are configured to detect whether a voice command of a user exists in real time, and when the smart screen is started and obtains the voice command sent by the user, immediately perform text conversion on the obtained voice command, so as to obtain voice text information corresponding to the voice command.

As shown in fig. 1, the method further comprises the steps of:

step 200, determining a plurality of target application programs according to the voice text information and the application domain categories.

Specifically, since the application domain categories may reflect application domains to which the user desires the plurality of applications that the smart screen opens to be respectively affiliated, the embodiment may obtain the plurality of target applications according to determining one target application in each of the application domain categories. The target application programs are a plurality of application programs which need to be opened by the intelligent screen in the voice instruction of the user. Because the application names of the target application programs corresponding to the application domain categories may directly exist in the voice text information, the embodiment also needs to determine the target application programs corresponding to the application domain categories together with the voice text information.

For example, when the voice text information is "open and flat elite and Tencel video", it may be determined that the application domain categories corresponding to the voice text information are a game application domain category and a video application domain category, respectively. According to the voice text information, the target application program corresponding to the game application field category can be directly obtained as 'peace elite', and the target application program corresponding to the video application field category is 'Tencent video'.

In one implementation manner, the step S200 specifically includes the following steps:

step S201, according to the voice text information, matching each application domain category in the application domain categories to obtain an application name corresponding to each application domain category;

step S202, determining the plurality of target application programs according to the application names corresponding to the application domain categories.

Specifically, since the voice text information may have definite directivity, that is, the voice text information directly includes the application name of the application program to be opened. The present embodiment thus provides a method of determining a target application for this type of phonetic text information that may occur. According to the voice text information, the application names corresponding to the application domain categories are matched, and the target application programs corresponding to the application domain categories are directly obtained according to the application names.

By way of example, if the user's voice text information is "open Tengqing video and happy xiaole", two application domain categories, namely a video application domain category and a game application domain category, and the target application program corresponding to the video application field category can be directly matched to be ' Tenced video ' according to the voice text information, and the target application program corresponding to the game application field category is ' happy xiaoxiaole '.

In another implementation manner, the determining a plurality of target application programs according to the plurality of application domain categories includes the following steps:

step S203, the application domain category which is not successfully matched in the application domain categories is used as a fuzzy application domain;

step S204, acquiring the quantity and the historical operation data of the fuzzy application fields;

step 205, determining a target application program corresponding to the fuzzy application field according to the quantity of the fuzzy application fields and the historical operation data.

Specifically, the directionality may be ambiguous due to the phonetic text information, i.e., the phonetic text information can only reflect which application programs in the application domain categories the user wants to open, but the names of the application programs are not explicitly described. The present embodiment thus provides another method of determining a target application for this type of phonetic text information. The application domain category which is not successfully matched is used as a fuzzy application domain, then the historical operation data of the user is obtained, and the historical operation data of the user can reflect the common application programs in the application domain categories, so that the target application program corresponding to each fuzzy application domain can be determined according to the historical operation data of the user and the quantity of the fuzzy application domains.

In one implementation manner, the step S205 specifically includes the following steps:

and step S2051, when the number of the fuzzy application fields is 1, determining the application program with the largest opening times in the application fields corresponding to the fuzzy application fields according to the historical operation data, and obtaining the target application program corresponding to the fuzzy application fields.

Specifically, if only one application domain class of the application domain classes does not have a corresponding application name, the application domain class is the only fuzzy application domain, and the most common application program in the application domain corresponding to the fuzzy application domain is determined according to the obtained historical operation data, namely the application program is used as a target application program corresponding to the fuzzy application domain.

For example, if the voice text information is "open Tech video and game", the target application corresponding to the video application category can be directly matched to be "Tech video" according to the voice text information, but the target application of the game application category cannot be directly matched, so that the game application category is regarded as a fuzzy application. And then acquiring historical operation data of the user, wherein only the game application field type is the fuzzy application field, and determining that the game frequently played by the user in the game application field type is a peaceful elite according to the historical operation data, and then setting the peaceful elite as a target application program corresponding to the game application field type.

In another implementation manner, the step S205 specifically includes the following steps:

step S2052, when the number of the fuzzy application fields is greater than 1, determining a class priority corresponding to each fuzzy application field;

and step S2053, determining a target application program corresponding to each fuzzy application field according to the category priority and the historical operation data.

If a plurality of application domain categories in the plurality of application domain categories do not have corresponding application names, the presence of a plurality of fuzzy application domains is indicated. The embodiment presets the priority information, which includes the class priority corresponding to each application domain class. Specifically, when a plurality of fuzzy application fields exist, determining a class priority corresponding to each fuzzy application field according to preset priority information, and because the class priority corresponding to each fuzzy application field can reflect the importance of the fuzzy application field, determining the target application programs corresponding to important fuzzy application fields based on the class priority, and determining the target application programs corresponding to other fuzzy application fields, thereby orderly and regularly obtaining the target application programs corresponding to all the fuzzy application fields respectively.

In one implementation manner, the determining, according to the category priority and the historical operation data, the target application program corresponding to each fuzzy application field specifically includes:

and determining the application program with the largest opening times of the target application program combination corresponding to the first data in the application field corresponding to the second data according to the historical operation data, and obtaining the target application program corresponding to the second data.

Specifically, when there are multiple fuzzy application areas, the multiple fuzzy application areas are classified into two types according to their corresponding class priorities, namely, first data and second data, where the first data is the fuzzy application area with the highest priority, and other fuzzy application areas not with the highest priority are the second data. For the first data, the embodiment may determine, according to the historical operation data, an application program that is most commonly used in an application field corresponding to the first data, and use the application program as a target application program corresponding to the first data. And aiming at the second data, determining an application program with the largest opening times of the target application program combination corresponding to the first data in the application fields corresponding to the second data according to the historical operation data in sequence by taking the class priority corresponding to each fuzzy application field in the second data as the sequence, wherein the application program is the target application program corresponding to the second data.

In one implementation, when the number of the second data is greater than 1, determining, for each second data, according to the historical operation data, an application program with the largest number of times of opening the target application program combination corresponding to the previous class priority in the application field corresponding to the second data, and then the application program is the target application program corresponding to the second data.

For example, when the voice command of the user is "play game, chat, watch video", it may be determined that there are a plurality of ambiguous application fields: the method comprises the steps of determining the class priority of a game class, the social class and the video class according to preset priority information, wherein the class priority of the game class is a, the social class is b, the video class is c, determining that an application program most commonly used in the field of game application is 'flat elite' according to historical operation data, and determining that the 'flat elite' is a target application program corresponding to the game class. Then, in the field of social application, according to the historical operation data, determining that the application program with the largest opening times combined with the 'flat elite' is 'WeChat', and then 'WeChat' is the target application program corresponding to the social class. Finally, in the video application field, the application program with the largest opening times combined with the WeChat is determined as the Tencel video according to the historical operation data.

In one implementation, as shown in fig. 2, before any one application is determined to be a target application, a voice prompt may be generated according to the application to confirm whether the user wants to target the application. The intelligent screen can inquire the user ' that the tremble sound is being opened for you ' through voice, if the user does not speak the tremble sound exactly once again ', the user can reply ' not opening the tremble sound, but opening the fighting fish ', and if the reply of the user is not received, the ' tremble sound ' is automatically selected as a target application program. In short, the present embodiment can improve the accuracy of voice instruction recognition by multiple rounds of interrogation.

In one implementation, if the target application is not installed, the target application is automatically downloaded and installed from the application mall.

As shown in fig. 1, the method further comprises the steps of:

and step S300, carrying out split screen display on the target application programs.

Specifically, in order to realize the screen sharing scene of the intelligent screen, after determining a plurality of target application programs corresponding to the voice instructions of the user, the embodiment also needs to carry out screen sharing display on the target application programs, so that a plurality of application programs can be opened on the screen of the intelligent screen at the same time, and the requirements of the user on different application programs are met.

In one implementation, the step S300 specifically includes the following steps:

step S301, a screen splicing template is obtained, and a plurality of windows are determined according to the screen splicing template, wherein the windows are in one-to-one correspondence with the target application programs;

step S302, the target application programs are opened according to the windows.

Specifically, the screen spelling template in this embodiment may be a preset screen spelling template, or may be an appropriate screen spelling template determined by the intelligent screen according to the number of several target application programs. The screen spelling template usually comprises windows with the same number as the target application programs, and then each window opens one target application program, so that the screen spelling display of the target application programs can be realized.

In one implementation, when the smart screen is already in the screen-sharing scene, a new voice command of the user is received, a prompt voice is automatically generated, and the user is asked whether to reestablish the new screen-sharing scene or add a new window in the original screen-sharing scene. If the user replies no, the original screen-spelling scene is turned off, and a new screen-spelling scene is re-established according to the received new voice command.

In one implementation, the user may replace one of the target applications by speech in an already open split screen scenario, such as a direct speech "change XX application to XXX application" or "do not see XX, see XXX".

In one implementation, the user may also exit the currently established screen-sharing scenario by voice, or return to any one of the applications from the currently established screen-sharing scenario by voice or to a default window.

In one implementation, the method further comprises:

step S400, determining that the number of the application field categories is 1 according to the voice command, determining a target application program corresponding to the application field category, and opening the target application program by adopting a single window.

Specifically, when the voice data of the user is identified to only include one application field category, the number of the application programs which need to be opened currently by the user is 1, so that the target application program does not need to be opened by adopting a screen splicing scene, and only the target application program needs to be opened by adopting a single window, namely, the method is equivalent to that the screen splicing scene is not constructed, but a conventional single scene is constructed.

Based on the above embodiment, the present invention further provides a multi-application screen spelling device based on voice implementation, as shown in fig. 3, where the device includes:

the acquisition module 01 is used for acquiring voice text information and determining a plurality of application field categories corresponding to the voice text information;

a determining module 02, configured to determine a plurality of target application programs according to the voice text information and the plurality of application domain categories;

and the screen splicing module 03 is used for carrying out screen splicing display on the target application programs.

Based on the above embodiment, the present invention also provides a terminal, and a functional block diagram thereof may be shown in fig. 4. The terminal comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. Wherein the processor of the terminal is adapted to provide computing and control capabilities. The memory of the terminal includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the terminal is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a speech-based implementation of a multi-application screen-splitting method. The display screen of the terminal may be a liquid crystal display screen or an electronic ink display screen.

It will be appreciated by those skilled in the art that the functional block diagram shown in fig. 4 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the terminal to which the present inventive arrangements may be applied, and that a particular terminal may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.

In one implementation, one or more programs are stored in a memory of the terminal and configured to be executed by one or more processors, the one or more programs including instructions for performing a speech-based implementation of a multi-application screen-splitting method.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

In summary, the invention discloses a multi-application screen spelling method, a device and a storage medium based on voice implementation, wherein the method comprises the following steps: acquiring voice text information and determining a plurality of application field categories corresponding to the voice text information; determining a plurality of target application programs according to the voice text information and the application domain categories; and carrying out split screen display on the target application programs. The invention can identify a plurality of application programs which need to be opened in the voice of the user and display the identified plurality of application programs in a screen-splicing way, so that the problem that the traditional voice interaction technology is difficult to be suitable for the screen-splicing scene of the intelligent screen can be effectively solved.

It is to be understood that the invention is not limited in its application to the examples described above, but is capable of modification and variation in light of the above teachings by those skilled in the art, and that all such modifications and variations are intended to be included within the scope of the appended claims.

Claims

1. A multi-application screen-spelling method based on voice implementation, the method comprising:

performing split screen display on the target application programs;

determining a plurality of target application programs according to the voice text information and the application domain categories, wherein the method comprises the following steps:

taking the application field category which is not successfully matched in the application field categories as a fuzzy application field; acquiring the number of the fuzzy application fields and historical operation data, and determining the class priority corresponding to each fuzzy application field when the number of the fuzzy application fields is greater than 1;

when the class priority corresponding to the fuzzy application field is the highest priority, the fuzzy application field is used as first data; determining an application program with the largest opening times in the application field corresponding to the first data according to the historical operation data to obtain a target application program corresponding to the first data;

when the class priority corresponding to the fuzzy application field is not the highest priority, the fuzzy application field is used as second data; and determining the application program with the largest opening times of the target application program combination corresponding to the first data in the application field corresponding to the second data according to the historical operation data, and obtaining the target application program corresponding to the second data.

2. The speech-based implemented multi-application screen-splitting method of claim 1, wherein the obtaining speech-text information comprises:

acquiring a voice instruction;

3. The voice-based multi-application screen-pooling method of claim 1, wherein said determining a number of target applications based on the voice text information and the number of application domain categories comprises:

4. The voice-implemented multi-application screen-spelling method of claim 1 wherein determining a target application corresponding to the blurred application field based on the number of blurred application fields and the historical operating data comprises:

5. The voice-based implemented multi-application screen-pooling method of claim 1, wherein the screen-pooling displaying the plurality of target applications comprises:

and opening the target application programs according to the windows.

6. A speech-based multi-application screen sharing device, the device comprising:

the screen splicing module is used for carrying out screen splicing display on the plurality of target application programs;

7. A computer readable storage medium having stored thereon a plurality of instructions adapted to be loaded and executed by a processor to implement the steps of the speech based implemented multi-application screen spelling method of any of the above claims 1-5.