CN109671438A

CN109671438A - It is a kind of to provide the device and method of ancillary service using voice

Info

Publication number: CN109671438A
Application number: CN201910082510.0A
Authority: CN
Inventors: 不公告发明人
Original assignee: Wuhan Entra Information Technology Co Ltd
Current assignee: Wuhan Entra Information Technology Co Ltd
Priority date: 2019-01-28
Filing date: 2019-01-28
Publication date: 2019-04-23

Abstract

The device and method of ancillary service is provided using voice the present invention relates to a kind of, wherein device includes: voice messaging acquisition module, acquires the audio content in its range of receiving；Identification module obtains the voiceprint of more people in response to occurring the vocal print of more people in the audio content of acquisition；And voiceprint and preformed scene voice print database collection based on more people, it determines the identity type belonging to each one, combines the identity type that the identity type belonging to each one assembles more people；Speech analysis module carries out speech recognition to the audio content of acquisition and obtains voice content, and carries out semantics recognition to the voice content of acquisition, obtains key message；And service content provides module, the identity type based on more people combines corresponding Permission Levels, the key message and preset user preference, provides service content and is shown.The technical solution that the embodiment of the present invention proposes specially is interacted with intelligent terminal without user.

Description

Device and method for providing auxiliary service by using voice

Technical Field

The invention belongs to the technical field of human-computer interaction, and particularly relates to a device and a method for providing auxiliary service by using voice.

Background

With the development of face recognition technology and speech recognition technology, their application scenarios are also continuously expanding. In the current human-computer interaction scene, a user and an intelligent robot are in one-to-one conversation interaction, firstly, the intelligent robot verifies whether the user is consistent with identity card information provided by the user through a face recognition technology, after the verification is passed, the user sends a voice instruction expression requirement, the intelligent robot recognizes voice information through the voice recognition technology, and products required by the user are displayed through a visualization technology and voice; and then the user confirms whether the product meets the requirements through voice to finish interaction. And recognizing the voice command issued by the client, and performing the current face recognition technology according to the command of the client by the intelligent robot. The applicant has found that the current way of interaction requires the user to interact specifically with the intelligent terminal.

Disclosure of Invention

In order to solve the technical problem that the current interaction mode requires a user to interact with an intelligent terminal specially, the embodiment of the invention provides a device and a method for providing auxiliary services by using voice.

In a first aspect of the present invention, an apparatus for providing supplementary services using voice is provided. The device includes: the system comprises a voice information acquisition module, an identity recognition module, a voice analysis module and a service content providing module; wherein,

the voice information acquisition module acquires audio contents in a receiving range;

the identity recognition module is used for responding to voiceprints of a plurality of persons appearing in the audio content collected by the voice information collection module, carrying out voiceprint recognition on the audio content collected by the voice information collection module and obtaining the voiceprint information of the plurality of persons; determining the identity type of each person based on the acquired voiceprint information of the multiple persons and a pre-formed scene voiceprint data set, and gathering the identity types of the people into the identity type combination of the multiple persons; the scene voiceprint data set represents the incidence relation between the voiceprint information of the person and the identity type;

the voice analysis module is used for carrying out voice recognition on the audio content collected by the voice information collection module to obtain voice content, carrying out semantic recognition on the obtained voice content and obtaining key information; and

and the service content providing module is used for providing service content for display based on the authority level corresponding to the identity type combination of the plurality of people, the key information and the preset user preference.

In some embodiments, the service content providing module determines the permission level possessed by the identity type combination according to the identity type combination of the multiple persons determined by the identity identification module; providing alternative service contents meeting the authority level according to the key information acquired by the voice analysis module; and determining service content from the alternative service content according to the preset user preference, and providing the determined service content for display.

In some embodiments, the apparatus further includes an external control interface, where the external control interface is configured to receive a voice command or a hardware command issued by a user, and change the provided service content or change the displayed service content based on the voice command or the hardware command.

In some embodiments, the voice information collection module continuously collects audio content within its reception range, or starts or stops collecting audio content within its reception range as instructed by a user.

In some embodiments, the service content providing module includes an electronic device having an input-output interaction function, outputs the provided service content through the electronic device, and is capable of receiving information input by a user to the electronic device, and operates the output service content according to the input information.

In some embodiments, if the obtained voiceprint information of the person is already stored in the scene voiceprint dataset, the identity type stored in the scene voiceprint dataset and associated with the obtained voiceprint information of the person is determined as the identity type to which the person belongs; and if the acquired voiceprint information of the person is not stored in the scene voiceprint data set, determining that the identity type of the person is a stranger.

In some embodiments, the provided service content includes a function which requires re-authentication of the identity type to be granted, and the function is executed in response to the authentication of the identity type;

and/or the presence of a gas in the gas,

and the provided service content is provided with a privacy level, and when the privacy level needs to check the identity of the related personnel, only the voiceprint information of the related personnel is confirmed.

In a second aspect of the present invention, a method for providing supplementary services using voice is provided. The method comprises the following steps:

responding to voiceprints of multiple persons appearing in the collected audio content, and carrying out voiceprint recognition on the collected audio content to acquire voiceprint information of the multiple persons;

determining the identity type of each person based on the acquired voiceprint information of the multiple persons and a pre-formed scene voiceprint data set, and gathering the identity types of the people into the identity type combination of the multiple persons; the scene voiceprint data set represents the incidence relation between the voiceprint information of the person and the identity type;

carrying out voice recognition on the collected audio content, carrying out semantic recognition on the recognized voice content, and acquiring key information; and

and providing service content for display based on the authority level corresponding to the identity type combination of the plurality of people, the key information and the preset user preference.

In some embodiments, the providing service content for display based on the permission level corresponding to the identity type combination of the plurality of people, the key information, and the preset user preference includes: determining the authority level of the identity type combination according to the identity type combination of the multiple persons; providing alternative service contents meeting the authority level according to the acquired key information; and determining service content from the alternative service content according to the preset user preference, and providing the determined service content for display.

In certain embodiments, the method further comprises: and receiving a voice instruction or a hardware instruction issued by a user, and changing the provided service content or changing the displayed service content.

The invention has the beneficial effects that: according to the device and the method for providing the auxiliary service by using the voice, provided by the embodiment of the invention, aiming at an application scene of multi-person conversation, the identity type combination of multiple persons is formed through voiceprint recognition, the key information in the multi-person conversation is obtained, the service content is provided according to the permission level corresponding to the combination of the multiple identity types, the obtained key information and the preset user preference, the user does not need to interact with an intelligent terminal specially, the video content does not need to be acquired, the face recognition does not need to be performed, and the function of the auxiliary service can be realized only by acquiring the audio information. In addition, the technical scheme provided by the embodiment of the invention does not need to provide services in a question-and-answer mode with the user, so that the normal operation of the conversation is not influenced. In addition, the technical scheme provided by the embodiment of the invention provides an external control interface of the user, and can quickly provide auxiliary service for the user.

Drawings

Fig. 1 is a schematic structural diagram of an apparatus for providing an auxiliary service using voice according to an embodiment of the present invention;

fig. 2 is a flowchart of an embodiment of a method for providing supplementary services using voice according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an apparatus for providing an auxiliary service using voice according to an embodiment of the present invention in an application scenario.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings. Those skilled in the art will appreciate that the present invention is not limited to the drawings and the following examples.

An embodiment of the present invention provides an apparatus for providing an auxiliary service by using voice, as shown in fig. 1, including: the system comprises a voice information acquisition module, an identity recognition module, a voice analysis module and a service content providing module.

And the voice information acquisition module acquires audio contents in a receiving range.

In one embodiment, the voice information collection module includes a microphone. Those skilled in the art will appreciate that other devices for capturing audio content may be used, such as a voice information capture module including a microphone.

The identity recognition module is used for responding to voiceprints of a plurality of persons appearing in the audio content collected by the voice information collection module, carrying out voiceprint recognition on the audio content collected by the voice information collection module and obtaining the voiceprint information of the plurality of persons; and determining the identity type of each person based on the acquired voiceprint information of the multiple persons and a pre-formed scene voiceprint data set, and integrating the identity types of the people into the identity type combination of the multiple persons.

In one embodiment, the identity recognition module comprises a voiceprint recognition unit and an identity type determination unit, wherein the voiceprint recognition unit responds to voiceprints of multiple persons in audio contents acquired by the voice information acquisition module, performs voiceprint recognition on the audio contents acquired by the voice information acquisition module, and acquires voiceprint information of the multiple persons; the identity type determining unit determines the identity types of the multiple persons based on the acquired voiceprint information of the multiple persons and a pre-formed scene voiceprint data set, and the identity types of the multiple persons are integrated into the identity type combination of the multiple persons.

In this embodiment, a scene voiceprint dataset is used to represent the association between the voiceprint information of a person and the identity type. Before the identity type of the recognized voiceprint is determined, voiceprint information of people is already stored in a pre-formed scene voiceprint data set, and the identity type of each person is given. It can be understood that the voiceprint information of the person stored in the previously formed scene voiceprint data set can be obtained from the audio content collected in the field, for example, the audio content collected by the voice information collecting module or other voice information collecting modules (for example, a sound pickup); and may also be obtained from uploaded audio material. In one embodiment, a time period of audio content may be selected from already acquired audio content, or a time period of audio content may be acquired in real time by triggering start and stop operations, and by identifying the time period of audio content, voiceprint information of a person is acquired. In one embodiment, the video content of a time period can be selected from the already collected video content, or the video content of a time period can be collected in real time by triggering the start-stop operation; identifying the voiceprint information of a person in the video content of the time period, and giving the identity type of the person; voiceprint information of a person and the identity type of the person are stored in a scene voiceprint data set in an associated mode.

In this embodiment, the identity types are used to represent different jobs, jobs and/or positions. In one embodiment, the identity types include identity types divided by job, such as board managers, general managers, department managers, general employees, practice employees, diligents, and the like. In another embodiment, the identity types include identity types divided by work, such as A project principal, A project development technician, A project testing technician, A project market developer, and the like.

In one embodiment, if the obtained voiceprint information of the person is already stored in the scene voiceprint dataset, the identity type stored in the scene voiceprint dataset and associated with the obtained voiceprint information of the person is determined as the identity type to which the person belongs; and if the acquired voiceprint information of the person is not stored in the scene voiceprint data set, determining that the identity type of the person is a stranger.

In an embodiment, the scene voiceprint data set comprises a first scene voiceprint sub data set representing determined belonged identity types and a second scene voiceprint sub data set representing to-be-supplemented belonged identity types, wherein voiceprint information of persons endowed with the belonged identity types and the belonged identity types endowed by the persons are stored in the first scene voiceprint sub data set in an associated mode, and voiceprint information of strangers is stored in the second scene voiceprint sub data set as required to supplement the belonged identity types of strangers. The pre-formed scene voiceprint dataset belongs to the first scene voiceprint sub-dataset.

And the voice analysis module is used for carrying out voice recognition on the audio content collected by the voice information collection module and carrying out semantic recognition on the voice content recognized by the voice recognition unit to obtain key information. In one embodiment, the voice analysis module includes a voice recognition unit and a semantic recognition unit, wherein the voice recognition unit performs voice recognition on the audio content collected by the voice information collection module, and the semantic recognition unit performs semantic recognition on the voice content recognized by the voice recognition unit to obtain the key information.

The user may be one of the plurality of persons or may be another person not belonging to the plurality of persons, e.g. a principal of a person of the plurality of persons, who principal delegates or assigns the person to appear in the scene of the plurality of persons.

In one embodiment, the service content providing module determines the authority level possessed by the identity type combination according to the identity type combination of the multiple persons determined by the identity identification module; and providing service content for display according to the key information acquired by the voice analysis module, the authority level and the preset user preference. In an embodiment, the service content providing module may provide alternative service content meeting the permission level according to the key information obtained by the voice analysis module; and determining service content from the alternative service content according to the preset user preference, and providing the determined service content for display. In an embodiment, when the number of the provided service contents is multiple, the multiple service contents are sequentially displayed according to the ordering of the corresponding key information or according to the preset user preference. It is understood that the ordering of the key information may relate to the frequency of occurrence, the time of occurrence of the key information; the preset user preference may relate to a personalized setting of one of the persons, which may be the most important person of the persons, or a person of the persons who is a host.

In one embodiment, the device further comprises an external control interface, and the external control interface is used for receiving voice commands or hardware commands issued by users. The external control interface receives a voice command issued by a user and can receive the voice command through the voice information acquisition module or other voice information acquisition modules, and the external control interface receives a hardware command issued by the user and can receive the hardware command through a virtual key or an entity key provided by the device. The voice command can be triggered by a specific term, which can be, for example, a name representing some kind of real object, or an imaginary word, as long as it is ensured that the voice command is not confused with the language in the conversation scene. The hardware instruction may be triggered by a specific key, which may be, for example, a floating key set on the display screen by the apparatus, or a key provided by the apparatus, and the key is used as an external control interface in a scene in which service content is displayed. In some application scenarios, for example, when the provided service content is displayed, the displayed content may be switched in response to a voice instruction or a key instruction of the user, or the service content may be newly provided for display in response to a keyword input by the user through the voice instruction or the key instruction.

In one embodiment, the service content providing module may include a display screen through which the provided service content is displayed. In another embodiment, the service content providing module includes an electronic device having an input-output interactive function, outputs the provided service content through the electronic device, and is capable of receiving information (e.g., a voice instruction or a hardware instruction) input by a user to the electronic device, and operates the output service content according to the input information. The input information may be, for example, a page operation such as an instruction to enlarge, reduce, slide, and rotate the output service content, an activation operation such as an instruction to click a link provided in the output service content, an operation to reply to a selection or confirmation query provided in the output service content, or the like.

In one embodiment, the scene voiceprint dataset further stores an account number of a contact tool associated with the voiceprint information of the person. The contact means may comprise at least a cell phone, a WeChat or a QQ. It is understood that the contact tool may also include other ways in which information interaction can be achieved. Further, in an embodiment, the provided service content is sent to an account of the contact tool, and the user is prompted to view the service content, so that the service content can be separately displayed on the user equipment. In another embodiment, the provided service content is sent to the account of the contact tool in the form of a theme, and when the theme is multiple, the theme is displayed in sequence (for example, according to the sorting of key information or according to preset user preferences) for the user to know.

In one embodiment, the service content may be obtained from an internal server of the organization or may be obtained from an external server via a network. Further, in one embodiment, the service content providing module may include an internal server storing a repository of service content for selection. The service content providing module can automatically update the service content in the internal server and mark the updated service content for verification by the user, and can also update the service content in the internal server according to the instruction of the user.

In one embodiment, the identity recognition module, the voice analysis module and the service content providing module may be integrated in the service terminal.

In some application scenarios, the voice information collection module may continuously collect audio content within its reception range, or may start or stop collecting audio content within its reception range according to an instruction of a user.

In addition, in an embodiment, considering that a mode for providing service content is mainly based on a mechanism of key information triggering and authority level screening, a keyword interface, an authority level interface, an alternative service content display interface and the like can be independently set, and the alternative service content is enriched by continuously updating a database subsequently, so that the auxiliary service function of the embodiment of the invention is continuously strengthened.

It is understood that the provided service content may be presented in a circular playing manner, may be presented in an automatic playing manner according to a preset or external instruction, and may be continuously updated according to new key information or a new identity type combination. In one embodiment, the related service content can be inserted in the provided service content according to the new key information or the new identity type combination.

It should be understood that the presented service content should have play rights to the current person.

In addition, if the displayed service content includes a function which needs a specific identity type to be executed, the identity type verification can be performed again before the function is executed. For example, if the ticket ordering service function is called by the displayed service content, before the ticket ordering service is executed, voiceprint recognition is performed again, and the relevant person is prompted to confirm the identity type, and if necessary, the relevant person may be requested to provide name information.

Moreover, in an embodiment, a security level may be set for the displayed service content, and when the security level needs to perform identity verification on the relevant person, the voiceprint information of the person is confirmed again, and other persons are not confirmed. For example, if enterprise financing data related to the enterprise business secret is to be displayed, a prompt for identity verification of related personnel is given, and corresponding service content is provided in response to voiceprint information of a specific personnel collected by a specified voice information collection module. In the embodiment, it can be understood that a plurality of voice information acquisition modules may be installed at different locations, for example, in a conference room, and the voice information acquisition modules are installed at an entrance, an exit, a corner, and the like of the conference room, respectively, and voiceprint information identified in audio content acquired by the voice information acquisition modules is collected into a same scene voiceprint data set; when the identity is checked, the voice information collection module for identity checking is a voice information collection module with a fixed position, for example, the voice information collection module for identity checking is arranged at a specific position on a conference table in a conference room, and even the voice information collection module for identity checking may be arranged at other places outside the conference room, or may be completed through mobile devices of related personnel (for example, mobile phones, notebook computers, desktop computers, tablet computers and the like of related personnel).

An embodiment of the present invention further provides a method for providing an auxiliary service by using voice, as shown in fig. 2, including:

In an embodiment, the voiceprint recognition of the acquired audio content is performed in response to the voiceprints of multiple persons appearing in the acquired video content, and the voiceprint information of the multiple persons is acquired, which can be realized by a voice information acquisition module; determining the identity type of each person based on the acquired voiceprint information of the plurality of persons and a pre-formed scene voiceprint data set, wherein the identity type of each person can be determined through an identity recognition module; the voice recognition is carried out on the collected audio content, the semantic recognition is carried out on the voice content recognized by the voice recognition unit, and the key information is obtained, and the key information can be obtained through the voice analysis module; and combining the corresponding authority level, the key information and the preset user preference by the identity types of the multiple persons, providing service content for display, and realizing the purpose through a service content providing module.

The audio content can be collected by a voice information collection module. In one embodiment, the voice information collection module includes a microphone. Those skilled in the art will appreciate that other devices for capturing audio content may be used, such as a voice information capture module including a microphone.

Providing service content for display based on the permission level corresponding to the identity type combination of the plurality of people, the key information and the preset user preference, wherein the method comprises the following steps:

determining the authority level of the identity type combination according to the identity type combination of the multiple persons;

providing alternative service contents meeting the authority level according to the acquired key information;

and determining service content from the alternative service content according to the preset user preference, and providing the determined service content for display.

In an embodiment, when the number of the provided service contents is multiple, the multiple service contents are sequentially displayed according to the ordering of the corresponding key information or according to the preset user preference. It is understood that the ordering of the key information may relate to the frequency of occurrence, the time of occurrence of the key information; the preset user preference may relate to a personalized setting of one of the persons, which may be the most important person of the persons, or a person of the persons who is a host.

In an embodiment, the method further comprises: and receiving a voice instruction or a hardware instruction issued by a user, and changing the provided service content or changing the displayed service content.

The content corresponding to the method for providing auxiliary service by using voice and the device for providing auxiliary service by using voice according to the embodiments of the present invention is understood according to the foregoing description of the device for providing auxiliary service by using voice, and is not described herein again.

The following describes an exemplary technical solution proposed by the embodiment of the present invention with reference to a specific application scenario and taking a device that provides an auxiliary service by using voice as an example.

In this embodiment, the service content providing module is a service terminal disposed in a public meeting place of an enterprise, and the service terminal includes at least one touch-enabled display screen for interaction with a user.

When the service terminal obtains that the current identity type combination is the enterprise employee A (as a host), the enterprise employee B and the stranger C according to the identity type combination, determining the authority level of the combination as a general authority; then, according to the obtained key information, for example, when the obtained key words include enterprise names, wireless communication service names in a plurality of main businesses engaged in by the enterprise, and a certain provincial place name, the service content providing module calls the data corresponding to the key words stored in the database as first type alternative service contents, for example, service contents such as enterprise external introduction propaganda PPT, enterprise introduction, enterprise wireless communication service current year condition summary and the like; meanwhile, the service content providing module calls contents stored in the database, such as train shift, airplane flight shift query entry and the like from the location of the enterprise to a place name of a certain province as second alternative service contents; and then according to the user preference preset by a host, namely the enterprise employee A, selecting part or all of the first type alternative service content and/or the second type alternative service content as the determined service content, and providing the determined service content for display.

In terms of the experience of the enterprise employee a (host), the scenario is that the enterprise employee a introduces the enterprise profile to the client C together with the enterprise employee B to promote the enterprise business, during the open chat of the enterprise employee a and the enterprise employee B in front of the service terminal, for example, the main interface of the service terminal circularly plays some display information of the enterprise, and meanwhile, at the operable interface of the service terminal, with the continuous conversation of the enterprise employee a, the enterprise employee B and the client C, some service content alternatives which can be clicked to open naturally appear. For example, when the employee a of the enterprise refers to the name of the enterprise and the recent development, the display effect can be naturally enhanced by introducing the advertising PPT to the outside through the enterprise automatically shown on the touch-sensitive display screen, and when a place name of a certain province is naturally mentioned in the communication with the customer C, the introduction and the communication of the next topic can be naturally introduced to a branch company of the province through the enterprise automatically shown on the touch-sensitive display screen. And once the requirement for carrying out business by actually visiting a place name of a certain province is judged, the time can be inquired and determined by the train shift and flight shift inquiry entries automatically provided by the service terminal, and even the tickets are ordered on the spot.

It will be appreciated that if a voiceprint of a higher level person, such as the director of the enterprise, appears in the captured audio content, the privilege level corresponding to the current combination of affiliated identity types may be the highest level. It is also understood that when the current identity type combination includes the CFO (chief financial officer) of the enterprise, the provision of the plurality of alternative service contents may also include setting the data related to the recent financial status of the enterprise as one of the alternative service contents.

In an application scenario, an apparatus for providing an auxiliary service using voice according to an embodiment of the present invention is a terminal, as shown in fig. 3, including: the voice recognition device comprises a shell, a first voice information acquisition device, a second voice information acquisition device, a voiceprint recognition device, an identity verification device and a display device.

First voice information acquisition equipment is installed inside the casing and is close to the casing openly, gathers the first audio content of its receiving range, and the casing corresponds a plurality of first through-holes have been seted up to first voice information acquisition equipment's position. The receiving range of the first voice information collecting device is located near the mouth of an adult, so that the audio content of the multi-person conversation can be collected conveniently and comprehensively as much as possible. The number of the first voice information acquisition equipment can be one or more, and the number of the first voice information acquisition equipment can be determined according to the size of the space where the terminal is located.

The second voice information acquisition equipment is arranged inside the shell and close to the back of the shell, and a plurality of second through holes are formed in the position, corresponding to the second voice information acquisition equipment, of the shell. The second voice information collecting device collects second audio content within the receiving range of the second voice information collecting device. In one embodiment, a sound receiving part is arranged on the periphery of the second through hole, and the sound receiving part extends from the second through hole to the second voice information acquisition equipment to form a tapered conical structure. The number of the second voice information acquisition equipment is one.

In one embodiment, the first voice information collecting device and the second voice information collecting device each include a microphone. Those skilled in the art will appreciate that the first and second voice information collecting devices may also adopt other devices for collecting audio content, for example, the first and second voice information collecting devices respectively include sound pick-up devices.

The voiceprint recognition equipment is arranged in the shell, is connected with the first voice information acquisition equipment and the second voice information acquisition equipment, responds to the fact that the first voice information acquisition equipment acquires first audio content and voiceprints of multiple persons appear in the acquired first audio content, performs voiceprint recognition on the first audio content acquired by the first voice information acquisition equipment, and acquires first voiceprint information of the multiple persons; and responding to the second audio content collected by the second voice information collection equipment, performing voiceprint recognition on the second audio content collected by the second voice information collection equipment, and acquiring second voiceprint information in the collected second audio content. In the present embodiment, the technology of voiceprint recognition belongs to the prior art, and therefore, is not described in detail.

And the identity verification equipment is arranged inside the shell, is connected with the voiceprint recognition equipment and performs identity verification according to the second voiceprint information. In the present embodiment, the technology of performing identity authentication based on voiceprint information belongs to the prior art, and therefore, will not be described in detail.

And the display equipment is arranged on the front surface of the shell, is connected with the voiceprint recognition equipment and the identity verification equipment, displays the service content corresponding to the first voiceprint information, and responds to the second voiceprint information and passes identity verification to display the service content corresponding to the first voiceprint information and the second voiceprint information.

In an embodiment, the terminal further includes an external control interface connected to the display device, and the external control interface is configured to receive a voice command or a hardware command issued by a user. The external control interface receives a voice command issued by a user and can receive the voice command through the voice information acquisition equipment or other voice information acquisition equipment, and the external control interface receives a hardware command issued by the user and can receive the hardware command through a virtual key or an entity key provided by the terminal machine. The voice command can be triggered by a specific term, which can be, for example, a name representing some kind of real object, or an imaginary word, as long as it is ensured that the voice command is not confused with the language in the conversation scene. The hardware instruction may be triggered by a specific key, which may be, for example, a floating key set on a display screen of the terminal, or a key provided in the terminal, and the key is used as an external control interface in a scene where service content is displayed. In some application scenarios, for example, when the service content is displayed, the displayed content may be switched in response to a voice instruction or a key instruction of the user, or the service content may be provided again for display in response to a keyword input by the user through the voice instruction or the key instruction.

In one embodiment, the display device may include a display screen through which the service content is displayed. In another embodiment, the display device includes an electronic device having an input-output interactive function, the service content is displayed through the electronic device, and information (e.g., a voice instruction or a hardware instruction) input to the electronic device by a user can be received, and the displayed service content is operated according to the input information. The input information may be, for example, a page operation such as an instruction to enlarge, reduce, slide, and rotate the displayed service content, an activation operation such as an instruction to click a link provided in the displayed service content, an operation to reply to a selection or confirmation query provided in the displayed service content, or the like. In some application scenarios, the display screen is a touch screen.

In one embodiment, the service content may be obtained from an internal server included in the device or may be obtained from an external server through a network. Further, in one embodiment, the terminal may include an internal server storing a library of service content for display, in which the service content is stored in association with the first voiceprint information or the first and second voiceprint information. In another embodiment, the display device further includes a transceiver, where the transceiver sends the first voiceprint information to an external server or sends the first voiceprint information and the second voiceprint information to the external server when the second voiceprint information passes authentication, acquires corresponding service content from the external server, and displays the service content on a display screen of the display device, where the service content is stored in the external server in association with the first voiceprint information or the first voiceprint information and the second voiceprint information. In the another embodiment, the first voice information collecting device is connected to the display device (not shown), the display device obtains the first audio content from the first voice information collecting device and sends the first audio content to the external server, and the external server provides the service content according to information of the first audio content and the first voiceprint information (or the first voiceprint information and the second voiceprint information).

In some application scenarios, the first voice information collecting device and the second voice information collecting device may continuously collect audio content within their receiving ranges, or may start or stop collecting audio content within their receiving ranges according to an instruction from a user or a display device.

It should be understood that the displayed service content should have play rights that can be played to the current person.

In some application scenarios, a function which can be executed only after authentication occurs in the service content, before the function is executed, authentication is performed through a second voice information acquisition device, a voiceprint recognition device and an authentication device, first, a display device informs a relevant person of going to the position where the second voice information acquisition device is located and sends an instruction to the second voice information acquisition device, the second voice acquisition device receives the instruction and acquires second audio content, the voiceprint recognition device performs voiceprint recognition on the second audio content to acquire second voiceprint information in the second audio content, the authentication device performs authentication on the second voiceprint information, and after the authentication is passed, the second voiceprint information is sent to the display device.

The terminal provided by the embodiment of the invention can collect normal multi-person conversation contents by means of the first voice information collection equipment, and does not need to specially perform man-machine interaction with an intelligent terminal by a user; and the identity can be verified on the premise of not influencing the normal acquisition of the multi-person voiceprint information by means of the second voice information acquisition equipment, a voiceprint recognition mode the same as man-machine interaction is adopted, and a hardware framework such as face recognition is not required to be additionally introduced, so that the whole framework is simple and practical.

An embodiment of the present invention further provides a computer-readable storage medium storing a computer program for executing the foregoing method.

An embodiment of the present invention further provides a computer device, which includes a processor and the above computer-readable storage medium operatively connected to the processor, where the processor executes a computer program in the computer-readable storage medium.

Those of skill in the art will understand that the logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be viewed as implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The embodiments of the present invention have been described above. However, the present invention is not limited to the above embodiment. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An apparatus for providing supplementary services using voice, comprising: the system comprises a voice information acquisition module, an identity recognition module, a voice analysis module and a service content providing module; wherein,

2. The apparatus of claim 1, wherein the service content providing module determines the permission level possessed by the identity type combination according to the identity type combination of the plurality of persons determined by the identity recognition module; providing alternative service contents meeting the authority level according to the key information acquired by the voice analysis module; and determining service content from the alternative service content according to the preset user preference, and providing the determined service content for display.

3. The apparatus according to claim 1, further comprising an external control interface, wherein the external control interface is configured to receive a voice command or a hardware command issued by a user, and change the provided service content or change the displayed service content based on the voice command or the hardware command.

4. The device of claim 1 or 3, wherein the voice information collection module continuously collects the audio content in the receiving range thereof, or starts or stops collecting the audio content in the receiving range thereof according to the instruction of the user.

5. The apparatus of claim 1, wherein the service content providing module comprises an electronic device having an input-output interaction function, and is configured to output the provided service content through the electronic device, and is capable of receiving information input by a user to the electronic device, and operate the output service content according to the input information.

6. The apparatus according to claim 1, wherein if the obtained voiceprint information of the person is already stored in the scene voiceprint dataset, the identity type stored in the scene voiceprint dataset and associated with the obtained voiceprint information of the person is determined as the identity type to which the person belongs; and if the acquired voiceprint information of the person is not stored in the scene voiceprint data set, determining that the identity type of the person is a stranger.

7. The apparatus of claim 1, wherein the provided service content comprises a function that requires re-authentication of an identity type to be granted, and wherein the function is performed in response to authentication of the identity type;

and/or the presence of a gas in the gas,

8. A method for providing supplementary services using voice, comprising:

9. The method of claim 8, wherein providing service content for display based on the permission level corresponding to the identity type combination of the plurality of people, the key information and the preset user preference comprises:

10. The method of claim 8, further comprising:

and receiving a voice instruction or a hardware instruction issued by a user, and changing the provided service content or changing the displayed service content.