CN116841672A

CN116841672A - Method and system for determining visible and speaking information

Info

Publication number: CN116841672A
Application number: CN202310698696.9A
Authority: CN
Inventors: 魏玉玲; 祝小平; 袁志伟
Original assignee: Faw Beijing Software Technology Co ltd; FAW Group Corp
Current assignee: Faw Beijing Software Technology Co ltd; FAW Group Corp
Priority date: 2023-06-13
Filing date: 2023-06-13
Publication date: 2023-10-03

Abstract

The application discloses a method, a system, electronic equipment, a storage medium and an intelligent cabin for determining visible and speaking information, which comprise the steps of obtaining voice key information of a user and user portrait information corresponding to the voice key information; determining key functional elements in a key page based on the voice key information; and determining effect information and corresponding voice prompt information displayed on the key page based on the key function element, the voice key information and the user portrait information. By the method, the visible and namely functional presentation is diversified, and the user experience is improved.

Description

Method and system for determining visible and speaking information

Technical Field

The application relates to the technical field of vehicles, in particular to a method and a system for determining visible and so-to-speak information, electronic equipment, a storage medium and an intelligent cabin.

Background

When a user inputs characters displayed on an interface through voice, the voice can simulate the area where the clicked characters are (technically called view), so that 'visible' and 'talkable' are realized, and when clicking is simulated, an animation effect (such as finger or flower scattering) is displayed on the periphery of the view, so that the interaction effect is enhanced. However, at present, this kind of animation is relatively single.

The following problems exist in the prior art for the visual function:

1. the effect of the simulated click is irrelevant to the page;

2. the user portrait is irrelevant to the user portrait by voice control only;

3. and the voice broadcast performed during the simulated clicking is irrelevant to the page and the user portrait.

In order to solve the technical problems in the prior art, the application provides a method and a system for determining visible and speaking information, electronic equipment, a storage medium and an intelligent cabin.

Disclosure of Invention

In order to solve the technical problem that the visual and the speaking functions are single in the prior art, so that the user experience is poor, the application provides a method and a system for determining visual and the speaking information, electronic equipment, a storage medium and an intelligent cabin.

The method for determining the visible and the speaking information provided by the application for realizing the purpose comprises the following steps:

acquiring voice key information of a user and user portrait information corresponding to the voice key information;

determining key functional elements in a key page based on the voice key information;

and determining effect information and corresponding voice prompt information displayed on the key page based on the key function element, the voice key information and the user portrait information.

In some embodiments, the method for obtaining voice key information of the user specifically includes:

acquiring voice information of a user in real time;

and when the voice information is acquired, extracting the key information in the voice information.

In some embodiments, the obtaining the user portrait information corresponding to the voice key information specifically includes:

acquiring facial information of a user;

the user representation information is determined based on the face information and the voice key information.

In some embodiments, determining key functional elements in a key page based on voice key information specifically includes:

determining the key page and the key function elements in the key page based on the voice key information;

wherein the key page comprises a functional page, and the key functional elements comprise corresponding functional elements in the functional page.

In some embodiments, determining the effect information displayed on the key page based on the key function element, the voice key information and the user portrait information specifically includes:

acquiring a user emotion value based on the voice key information and the user face image;

and determining the effect information displayed on the key page based on the user emotion value and the user portrait.

In some embodiments, determining the effect information and the corresponding voice prompt information displayed on the key page specifically includes:

acquiring text information of the displayed effect information;

determining a cue tone corresponding to the displayed effect information based on the text information;

and determining the voice prompt information based on the prompt tone and the text information.

Based on the same conception, the application also provides a system for determining visible and namely information, which comprises the following steps:

the information acquisition module is used for acquiring voice key information of a user and user portrait information corresponding to the voice key information;

the functional element determining module is used for determining key functional elements in a key page based on the voice key information;

and the information determining module is used for determining the effect information and the corresponding voice prompt information displayed on the key page based on the key functional elements, the voice key information and the user portrait information.

Based on the same conception, the application also provides an electronic device comprising: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; the memory has stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the method of determining visual i.e. information described above.

Based on the same idea, the present application also provides a computer-readable storage medium, characterized in that it stores a computer program executable by an electronic device, which when run on the electronic device causes the electronic device to perform the steps of the above-mentioned method of determining visible and so-called information.

Based on the same idea, the application also provides an intelligent cabin provided with a determination system of visible i.e. information as described above.

Compared with the prior art, the application has the following beneficial effects:

Drawings

FIG. 1 is a schematic diagram of a method for determining visual and so-to-speak information according to the present application in some embodiments;

FIG. 2 is a schematic diagram of the structure of a method for determining visual and so-to-speak information according to the present application in some applications;

FIG. 3 is a schematic diagram of the system for determining visual and/or so-to-speak information according to the present application in some embodiments;

fig. 4 is a schematic structural diagram of an electronic device according to some embodiments of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plurality" generally includes at least two.

It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

It should be understood that although the terms first, second, third, etc. may be used in embodiments of the present application, these descriptions should not be limited to these terms. These terms are only used to distinguish one from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of embodiments of the application.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a commodity or device comprising such element.

In particular, the symbols and/or numerals present in the description, if not marked in the description of the figures, are not numbered.

Referring to fig. 1, a method for determining visible and speaking information includes:

s101, acquiring voice key information of a user and user portrait information corresponding to the voice key information;

specifically, the step acquires voice key information of a user and corresponding user portrait information;

in some of these applications, the voice key information may be an instruction about a function that the user wants to control, and the user portrayal information may be a user preference or emotion analyzed from the user voice key information.

In some of these applications, obtaining voice key information of a user includes obtaining user voice information in real time; and when the voice information is acquired, extracting the key information in the voice information.

It can be understood that the vehicle-mounted system acquires the voice information of the user in real time, and identifies and extracts key information in the voice information according to the acquired voice information, for example, the user controls the vehicle-mounted air conditioner through voice, and then the voice about how to control is the key information.

In some applications, acquiring user portrait information corresponding to the voice key information includes acquiring user face information; the user representation information is determined based on the face information and the voice key information.

It can be understood that the user face information can be obtained through the vehicle-mounted camera device, the user portrait information can be determined through the user face information and the user voice key information, for example, the age range of the user is obtained through the face information, the preference of the user is known through the user voice key information, the user portrait information can be determined by combining the age of the user and the preference of the user, and the user portrait information can be acquired and updated in real time, so that the preference of the user can be accurately judged especially for the same user.

S102, determining key functional elements in a key page based on the voice key information;

specifically, the step determines key functional elements in a key page based on the voice key information;

in some of these applications, determining the key page and the key functional elements in the key page based on the voice key information;

It will be appreciated that by means of voice key information, such as a specific function that the user wants to control, determining a key page, such as a page of a specific function, and key function elements in the key page, such as a specific function in a specific function page, the functional page may be an air conditioner, music or an entertainment page, and the corresponding function elements may correspond to a switch of the air conditioner, a selection, a pause or a play of the music.

S103, determining effect information and corresponding voice prompt information displayed on the key page based on the key function elements, the voice key information and the user portrait information.

Specifically, the method determines effect information and corresponding voice prompt information displayed on a key page through key functional elements, voice key information and user portrait information.

In some of these applications, determining effect information displayed on a key page based on key functional elements, voice key information, and user portrait information, including obtaining a user emotion value based on the voice key information and the user face image; and determining the effect information displayed on the key page based on the user emotion value and the user portrait.

It can be appreciated that, based on the voice key information and the user face image, the user emotion value is obtained, the emotion value can be the emotion amplitude, such as happiness or anger, the user face image can be used for knowing which state the emotion value of the user is in, the emotion state of the user can be obtained more accurately by combining the user mood in the user voice key information, the effect information displayed on the key page is determined based on the emotion value of the user and the user portrait, such as the effect information can be correspondingly determined as pacifying when the user is in the anger state, the display effect can be determined as happy when the user is in the happy state, and the display effect information can be determined by combining the individualization of the key page.

In some of these applications, determining effect information and corresponding voice prompt information displayed on a key page, including obtaining text information of the displayed effect information; determining a cue tone corresponding to the displayed effect information based on the text information; and determining the voice prompt information based on the prompt tone and the text information.

It can be understood that, after the display effect information is determined, text information of the effect information is obtained, then a prompt tone of the text information is determined according to the display effect information, and finally a voice prompt message is determined according to the prompt tone and the text information.

In some applications, after the display effect information and the voice prompt information are determined, the display effect information and the voice prompt information are respectively presented to the user in a visual mode and an audible mode so as to increase the experience of the user.

An embodiment of the method for determining visual information according to the present application in some of these applications will be described below with reference to fig. 2, as shown in fig. 2:

the first step: implementing visible and so-to-speak functions

The voice provides SDK for service end integration, the SDK is responsible for monitoring the currently displayed page, reporting the page content to the voice, the voice judges whether the page content is hit or not through voice input of a user, the voice SDK receives hit feedback given by the voice end, and the hit result can be processed visually.

And a second step of: visible and visible hit result processing

Through the first step, the voice gives the processing result of the voice SDK, the visible and i.e. the hit result is processed, and the SDK simulates the click effect before:

1. judging hit page characteristics, media, navigation or air conditioning, and the like, and displaying clicking effects related to the page characteristics, such as a gesture effect of a note, an automobile or an air conditioner;

2. hit results of the voices to the SDK comprise user images collected by the voices, such as by collecting user audio input, analyzing sound characteristics and combining user expressions captured by a camera, the user is very happy at the moment, and the moving effect displayed at the moment is very happy;

3. the hit result given by the voice to the SDK is given to which type of style music the hit page belongs to, such as rock music, the click effect can use rock gestures, and the hit effect belongs to the baby song, and the click effect tends to be a lovely interaction effect.

And a third step of: click effect combined with TTS (textTospeech, from text to speech)

On the basis of showing the clicking effect, the voice terminal is matched with corresponding TTS broadcasting, such as a song of Zhou Jielun on the current media page, a speaker is adjusted by the voice terminal, zhou Jielun sound is used for broadcasting, and child voice is used for broadcasting when a child song list is displayed.

For the purposes of simplicity of explanation, the method steps disclosed in the above embodiments are depicted as a series of acts in a combination, but it should be understood by those skilled in the art that the embodiments of the present application are not limited by the order of acts described, as some steps may occur in other order or concurrently in accordance with the embodiments of the present application. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the application.

Any process or method description that is flow chart or otherwise described may be understood as: means, segments, or portions of code representing executable instructions including one or more steps of a particular logic function or procedure are illustrated, and the scope of the preferred embodiment of the present application includes additional implementations in which functions may be executed out of order from that shown or discussed, including performing the functions in a substantially simultaneous manner or in an inverse order, or executing computer instructions in a loop, branch, etc. program structure and implementing the corresponding functions, depending on the function involved, as would be understood by those skilled in the art in practicing the embodiments of the present application.

As shown in fig. 3, the present application further provides a system for determining visible and so-to-speak information, including:

an information acquisition module 201, configured to acquire voice key information of a user and user portrait information corresponding to the voice key information;

a function element determining module 202, configured to determine key function elements in a key page based on the voice key information;

and the information determining module 203 is configured to determine effect information and corresponding voice prompt information displayed on the key page based on the key function element, the voice key information and the user portrait information.

Specifically, the system for determining visible and i.e. speaking information provided in this embodiment includes an information obtaining module 201, a functional element determining module 202 and an information determining module 203, where the information obtaining module 201 is configured to obtain voice key information of a user and user portrait information corresponding to the voice key information; a function element determining module 202, configured to determine key function elements in a key page based on the voice key information; and the information determining module 203 is configured to determine effect information and corresponding voice prompt information displayed on the key page based on the key function element, the voice key information and the user portrait information.

It should be noted that, although only some basic functional modules are disclosed in the embodiment of the present application, the composition of the present system is not meant to be limited to the above basic functional modules, but rather, the present embodiment is meant to express: one skilled in the art can add one or more functional modules to the basic functional module to form an infinite number of embodiments or technical solutions, that is, the system is open rather than closed, and the scope of protection of the claims is not limited to the disclosed basic functional module because the present embodiment only discloses individual basic functional modules. Meanwhile, for convenience of description, the above devices are described as being functionally divided into various units and modules, respectively. Of course, the functions of the units, modules may be implemented in one or more pieces of software and/or hardware when implementing the application.

The embodiments of the system described above are merely illustrative, for example: wherein each functional module, unit, subsystem, etc. in the system may or may not be physically separate, or may not be a physical unit, i.e. may be located in the same place, or may be distributed over a plurality of different systems and subsystems or modules thereof. Those skilled in the art may select some or all of the functional modules, units or subsystems according to actual needs to achieve the purposes of the embodiments of the present application, and in this case, those skilled in the art may understand and implement the present application without any inventive effort.

As shown in fig. 4, the present application further provides an electronic device, including: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; the memory has stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the method of determining visual i.e. information described above.

Specifically, fig. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application, and fig. 4 shows a block diagram of an exemplary electronic device suitable for implementing an embodiment of the present application. The electronic device shown in fig. 4 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present application. As shown in fig. 4, the electronic device 500 is embodied in the form of a general purpose computing device. The components of electronic device 500 may include, but are not limited to: one or more processing units or processors 516, a memory 528, a bus 518 that connects the various system components (including the memory 528 and the processor 516). Bus 518 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus. Electronic device 500 typically includes many types of computer system readable media. Such media can be any available media that is accessible by electronic device 500 and includes both volatile and nonvolatile media, removable and non-removable media. Memory 528 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 530 and/or cache memory 532. Electronic device 500 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 534 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in the figures, commonly referred to as a "hard disk drive"). Although not shown, the storage system 534 can provide a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., floppy disk, removable hard disk, hot-swappable storage media), and an optical disk drive for reading from and writing to a removable non-volatile optical disk (e.g., CD-ROM, DVD-ROM, or other optical media). In such cases, each drive may be coupled to bus 518 through one or more data media interfaces. Memory 528 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the application. A program/utility 540 having a set (at least one) of program modules 542 may be stored in, for example, memory 528, such program modules 542 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 542 generally perform the functions and/or methods in the embodiments described herein. The electronic device 500 may also communicate with one or more external devices 514 (e.g., keyboard, pointing device, display 524, etc.), one or more devices that enable a user to interact with the electronic device 500, and/or any devices (e.g., network card, modem, etc.) that enable the electronic device 500 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 522. Also, the electronic device 500 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter 520. As shown in fig. 4, the network adapter 520 communicates with other modules of the electronic device 500 over the bus 518. It should be appreciated that although not shown, those skilled in the art may use other hardware and/or software modules in connection with electronic device 500, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like. Processor 516 executes programs stored in memory 528 to perform various functional applications and data processing, such as methods provided by any one or more embodiments of the present application.

The present application also provides a computer-readable storage medium storing a computer program executable by an electronic device, which when run on the electronic device causes the electronic device to perform the steps of the above-described method of determining visible so-to-speak information.

In particular, the computer storage media of embodiments of the present application may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this embodiment, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The application also provides an intelligent cabin provided with a system for determining visible i.e. information as described above.

Specifically, a determination system for visible and speaking information is arranged in the intelligent cabin. Regarding intelligent cabins: human-vehicle interaction is the core of user experience, traditional automobile cabin functional area layout fragmentation and information overload cause human-vehicle interaction barriers, the value of an automobile as an interaction entrance is underestimated, and as electronic information technology starts to transfer into the automobile, an intelligent cabin is generated, the intelligent cabin can meet different requirements of different people in the automobile through various intelligent means, more intelligent and safe interaction experience is brought, and meanwhile, the intelligent cabin is also a key interface of new times of technologies such as advanced auxiliary driving, automatic driving and artificial intelligence.

By applying the technical scheme, the method, the system, the electronic equipment, the storage medium and the intelligent cabin for determining the visible and speaking information comprise the steps of acquiring voice key information of a user and user portrait information corresponding to the voice key information; determining key functional elements in a key page based on the voice key information; and determining effect information and corresponding voice prompt information displayed on the key page based on the key function element, the voice key information and the user portrait information. By the method, the visible and namely functional presentation is diversified, and the user experience is improved.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example: any of the embodiments claimed in the claims may be used in any combination of the embodiments of the application.

In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

In addition, the technical solutions of the embodiments of the present application may be combined with each other, but it is necessary to be based on the fact that those skilled in the art can implement the technical solutions, and when the technical solutions are contradictory or cannot be implemented, the combination of the technical solutions should be considered as not existing, and not falling within the scope of protection claimed by the present application.

All of the features disclosed in this specification, or all of the steps in a method or process disclosed, may be combined in any combination, except for mutually exclusive features and/or steps. Any feature disclosed in this specification may be replaced by alternative features serving the same or equivalent purpose, unless expressly stated otherwise. That is, each feature is one example only of a generic series of equivalent or similar features, unless expressly stated otherwise. Like reference numerals refer to like elements throughout the specification.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including the corresponding claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including the corresponding claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims

1. A method for determining visual information, comprising:

2. The method for determining visible and audible information according to claim 1, wherein the step of obtaining the voice key information of the user comprises the steps of:

acquiring voice information of a user in real time;

3. The method for determining visible and speaking information according to claim 1, wherein the step of obtaining user portrait information corresponding to the voice key information comprises:

acquiring facial information of a user;

4. The method for determining visible and so-called information according to claim 1, wherein determining key functional elements in a key page based on voice key information, specifically comprises:

5. The method for determining visible and so-called information according to claim 3, wherein the effect information displayed on the key page is determined based on the key function element, the voice key information and the user portrait information, and specifically comprises:

6. The method for determining visible and audible information according to claim 5, wherein determining the effect information and the corresponding voice prompt information displayed on the key page specifically comprises:

acquiring text information of the displayed effect information;

7. A system for determining visual i.e. information, comprising:

8. An electronic device, comprising: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; the memory has stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 6.

9. A computer readable storage medium, characterized in that it stores a computer program executable by an electronic device, which, when run on the electronic device, causes the electronic device to perform the steps of the method of any one of claims 1 to 6.

10. A smart capsule provided with a determination system of visual i.e. information as claimed in claim 7.