CN212135954U

CN212135954U - Voice control device and intelligent terminal

Info

Publication number: CN212135954U
Application number: CN202020630856.8U
Authority: CN
Inventors: 彭春艳
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-04-23
Filing date: 2020-04-23
Publication date: 2020-12-11
Anticipated expiration: 2030-04-23

Abstract

The utility model discloses a voice control device and an intelligent terminal, wherein the voice control device is arranged in a shell through a controller; the voice acquisition component and the image acquisition component are arranged on the shell and are both connected with the controller; the voice noise reducer is arranged in the shell and connected with the voice acquisition module; the data transmission component is arranged in the shell, the data transmission component is connected with the controller, the voice acquisition component and the image acquisition component, and the data transmission component is further connected with external equipment. The problem that the television cannot be controlled to process the image through voice is effectively solved.

Description

Voice control device and intelligent terminal

Technical Field

The utility model belongs to the technical field of pronunciation collection equipment technique and specifically relates to a voice control device and intelligent terminal.

Background

The current speech devices on the market only have a speech acquisition component, a speech recognition component and a data transmission component, and mainly function in recognizing the words of a user and converting sound signals into electric signals to control the operation of the terminal. However, for some terminals without image capturing components, such as televisions which are very popular with the general public, the voice device installed in the television cannot have the functions of voice control screen capture and recording, which seriously affects the experience of users. For example, when a user suddenly sees an interesting picture and wants to share the interesting picture with friends while watching television, the user has not yet reached to open a mobile phone (the operations of unlocking the mobile phone, opening a social application, opening a photo and the like are complicated), and the television picture is flashed, so that the voice control of television screen capture and recording is the best choice. However, the voice device in the prior art only has a voice acquisition component and a voice recognition component, and the television itself does not have an image acquisition function, and the voice device in the prior art cannot be installed in the television to realize the voice control of the television image for processing.

The prior art has yet to be improved and developed.

SUMMERY OF THE UTILITY MODEL

The to-be-solved technical problem of the utility model lies in, to the above-mentioned defect of prior art, provide a voice control device and intelligent terminal, aim at solving the problem that can't pass through the voice control TV screenshot.

The utility model provides a technical scheme that the problem adopted as follows:

in a first aspect, an embodiment of the present invention provides a voice control apparatus, wherein the voice control apparatus includes: a housing, a controller disposed within the housing; the voice acquisition component and the image acquisition component are arranged on the shell and are both connected with the controller; the voice noise reducer is arranged in the shell and connected with the voice acquisition module; the data transmission component is arranged in the shell, the data transmission component is connected with the controller, the voice acquisition component and the image acquisition component, and the data transmission component is further connected with external equipment. In one embodiment, a first sound receiving hole is formed in one side face of the PCB, a sound adjusting hole matched with the first sound receiving hole is formed in the isolating piece, a second sound receiving hole is formed in the mounting frame, and when the PCB passes through the isolating piece and is arranged on the mounting frame of the loudspeaker box device, the first sound receiving hole and the sound adjusting hole are aligned with the second sound receiving hole.

In one embodiment, the voice capturing component is disposed on the same side as the image capturing component.

In one embodiment, the voice collecting component is a microphone array disposed in a predetermined area on the housing, and the microphone array includes a plurality of microphones uniformly arranged on the side of the housing.

In one embodiment, the microphone array is any one of a circular array, a rectangular array, and an elliptical array.

In one embodiment, the microphones of the microphone array are arranged with a predetermined spacing therebetween.

In one embodiment, the image capturing component is a camera disposed on the housing, and the camera includes one or more cameras.

In one embodiment, the voice control apparatus further includes a voice processing module and an image processing module, and both the voice processing module and the image processing module are connected to the controller.

In one embodiment, the external device is a cloud server or a local terminal device.

In a second aspect, the embodiment of the present invention further provides an intelligent terminal, wherein the intelligent terminal includes any one of the above-mentioned voice control devices.

The utility model has the advantages that: the utility model provides a voice control device comes to gather sound and image through setting up pronunciation collection part and image acquisition part to pronunciation collection part still is connected with controller and pronunciation noise reducer, and the controller can come to carry out corresponding operation to gathering the image according to the pronunciation of gathering, and pronunciation noise reducer can fall the part of gathering and make an uproar, so that carry out corresponding operation according to the sound of gathering more easily. Therefore, the problem that the television cannot be controlled by voice to process the image is effectively solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic view of an overall structure of a voice control device according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a connection relationship between components of the voice control apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It should be noted that, if directional indications (such as upper, lower, left, right, front and rear … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative position relationship between the components, the motion situation, etc. in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indications are changed accordingly.

The voice device in the prior art only comprises a voice acquisition component, a voice recognition component and a data transmission component, and is mainly used for recognizing the words of a user and converting a voice signal into an electric signal to control the operation of the terminal. However, for some terminals without image capturing components, such as televisions which are very popular to the public, the voice device in the prior art cannot be provided with the functions of voice control screen capture and recording in the television, and the experience of users is seriously affected. For example, when a user suddenly sees an interesting picture and wants to share the interesting picture with friends while watching a television, the user has not yet reached to open a mobile phone (the operations of unlocking the mobile phone, opening a social application, opening a photo and the like are complicated), and the television picture is already flashed, so that the voice control of television screen capture and recording is the best choice in the situation. However, the voice device in the prior art only has a voice acquisition component, a voice recognition component and a data transmission component, and the television itself does not have an image acquisition function, and the voice device in the prior art cannot be installed in the television to realize the screen capture and recording of the television.

In order to solve the problems of the prior art, the present embodiment provides a voice control apparatus, as shown in fig. 1, the voice apparatus includes a housing 10, a controller 20 and a data transmission part 30 disposed in the housing 10, and a voice collecting part 40 and an image collecting part 50 disposed on the housing 10. As shown in fig. 2, the data transmission component 30 is connected to both the controller 20 and the image capturing component 50 for data exchange between the components.

Specifically, the voice collecting component 40 is used for collecting voice signals in the environment, and the voice collecting component 40 is connected with both the data transmission component 30 and the controller 20 (as shown in fig. 2). When the voice acquisition part 40 receives a voice signal, the signal is fed back to the controller 20 through the data transmission part 30, and the controller 20 outputs a corresponding instruction according to the signal. In this embodiment, the controller 20 may be a microchip-mounted controller for performing voice and image processing, and the TMS320C54xx chip may be used for specific applications. In one implementation, the voice collecting part 40 is a microphone array disposed in a preset area on the housing 10, that is, the voice collecting part 40 is composed of a certain number of microphones, and the microphone array is used for receiving voice signals. Compared with the arrangement of a single microphone, the arrangement of a plurality of microphones can filter sound waves by utilizing the difference between the phases of the sound waves received between the microphones, can remove the environmental background sound to the maximum extent, and only the needed sound waves are left. For use in noisy environments the microphone array may reduce noise.

Specifically, the microphone array is used for collecting sound signals, and when the voice control device performs voice recognition, the microphone array collects voice signals. In one implementation, when the terminal is detected to be started, the microphone array starts to collect voice signals in real time so as to collect voice instructions of a user in time. In one implementation, the microphone array may be a plurality of microphones uniformly arranged on the side of the housing 10 to achieve uniform sound reception around the housing.

When the microphones are arranged in close proximity, the correlation of spatial noise becomes large, and thus a desired speech signal cannot be obtained. Therefore, in one implementation, a preset interval is set between the microphones in the microphone array to achieve better suppression of signal frequency of spatial noise and achieve better sound reception effect.

Also closely related to the sound-collecting effect of the voice collecting part 40 is an array of the arrangement of the plurality of microphones on the housing 10. In one implementation, the microphone array may be any one of a circular array, a rectangular array, and an elliptical array. The specific type of the microphone array can be determined according to the spatial characteristics of the sound field where the voice control device is located. For example, a circular microphone array may be preferred if the speech device needs to achieve a full 360 ° range of pitch.

In order to realize the voice recognition function of the voice control device, as shown in fig. 2, the voice control device further includes a voice processing module 60, and the voice processing module 60 is connected to the controller 20. The voice processing module 60 is configured to convert the voice signal received by the voice collecting component 40 into a control instruction, and send the control instruction to the controller 20. In order to facilitate users with different accents to use the voice control device, the voice control device needs to record one or more pieces of voice information preset by the users. Specifically, the voice control device is provided with a plurality of control instructions, such as "i want to capture a screen" and "i want to record a screen", so that a user needs to record voice information in different control instruction columns through the voice acquisition component 40 according to the use requirements. When the voice control device is started, the voice acquisition component 40 monitors the current environment where the user is located in real time, judges whether the voice information preset by the user exists in the voice information of the current environment, acquires the voice information when the voice information preset by the user appears, sends the voice information to the voice processing module 60, converts the received voice signal into a control instruction, and sends the control instruction to the controller 20, and the controller 20 controls other components to output the operation expected by the user according to the control instruction.

Because the utility model discloses can be applied to terminals such as TV, the TV can produce very big sound when broadcasting the program usually, it is very unfavorable to pronunciation collection part 40 receives the preset sound information of user. Therefore, in one implementation, in order to reduce the background noise received by the voice acquisition component 40, the voice control apparatus is further provided with a voice noise reducer 70, and the voice noise reducer 70 is connected to the voice acquisition module 30 and is used for extracting useful user voice information from a noise background and suppressing and reducing the interference of the background noise. In one implementation, an intelligent voice noise reduction chip is mounted in the voice noise reducer 70, and a chip with a model number of SGM8541XC5/TR is selected in specific application. In one implementation, in order to reduce the background noise generated by the terminal where the voice control device is located to the maximum extent, when the voice collecting module 30 monitors that the voice information preset by the user appears, the controller 20 controls the terminal to temporarily enter a mute mode (the television is in the mute mode, and stops emitting all sounds), so that the environment where the terminal is located is in a relatively quiet state, and the voice emitted by the user can be clearly collected by the voice collecting module 30.

In order to realize the function of acquiring images by voice control of the voice control device, as shown in fig. 1, the voice control device is further provided with the image acquisition component 50, and the image acquisition component 50 is connected with both the data transmission component 30 and the controller 20 (as shown in fig. 2). When receiving the voice signal, the controller 20 sends a control instruction to the image capturing part 50 through the data transmission part 30, and controls the image capturing part 50 to capture an image. In one implementation, the voice capturing component 40 and the image capturing component 50 may be disposed on the same side to ensure that the direction of sound source is consistent with the direction of image capturing by the image capturing component 50.

Specifically, the image capturing component 50 is a camera disposed on the housing, and the number of the cameras may be one or more, and the arrangement of a plurality of cameras is favorable for obtaining a picture with high pixels and excellent detail. When the voice control device is installed on a mobile phone, a user only needs to say that 'i want to take a picture' when using the mobile phone, the voice acquisition component 40 receives voice information of the user, converts the voice information and then sends the converted voice information to the controller 20, and the controller 20 controls the camera to take a picture synchronously. When the image acquisition part 50 acquires the image required by the user, image data in a corresponding form needs to be output according to the user requirement. For example, the user needs image data of a video type, the voice control apparatus needs to output a video by synthesizing images of one frame acquired by the image acquisition unit 50. Therefore, in one implementation, the voice control apparatus further includes an image processing module 80 (as shown in fig. 2) connected to the controller, the image processing module 80 can process the image captured by the image capturing component 50 into image data in a corresponding form required by the user, and the image processing module 80 can perform superposition calculation on the image data obtained by the plurality of cameras through an algorithm such as AI, so as to synthesize a high-pixel and fine-detail picture.

In one implementation, the data transmission component 30 is further connected to an external device, which may be a cloud server or a local terminal device. Specifically, when the voice control device is installed in a television, when a user watches a live television or a video-on-demand source and finds a favorite image or video, the user can intercept the image or video of the current television playing content by "i want to capture an image" or "i want to record a video", and at this time, the user can send the intercepted image or video to an external device (such as a mobile phone) connected with the data transmission component 30 through the data transmission component 30, so as to achieve the purpose of sharing the intercepted image or video.

Based on the above embodiment, the utility model provides an intelligent terminal is still provided, intelligent terminal includes above-mentioned arbitrary any speech control device.

To sum up, the utility model discloses a voice control device and an intelligent terminal, wherein the voice control device is connected with the controller arranged in the shell; the voice acquisition component and the image acquisition component are arranged on the shell and are both connected with the controller; the voice noise reducer is arranged in the shell and connected with the voice acquisition module; the data transmission component is arranged in the shell, the data transmission component is connected with the controller, the voice acquisition component and the image acquisition component, and the data transmission component is further connected with external equipment. The problem that the screen capture of the television cannot be controlled through voice is effectively solved.

It is to be understood that the invention is not limited to the above-described embodiments, and that modifications and variations may be made by those skilled in the art in light of the above teachings, and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims

1. A voice control apparatus, characterized in that the voice control apparatus comprises: a housing, a controller disposed within the housing; the voice acquisition component and the image acquisition component are arranged on the shell and are both connected with the controller; the voice noise reducer is arranged in the shell and connected with the voice acquisition component; the data transmission component is arranged in the shell, the data transmission component is connected with the controller, the voice acquisition component and the image acquisition component, and the data transmission component is further connected with external equipment.

2. The voice control device according to claim 1, wherein the voice capturing part is disposed on the same side as the image capturing part.

3. The voice control device of claim 2, wherein the voice collecting component is a microphone array disposed in a predetermined area on the housing, and the microphone array comprises a plurality of microphones uniformly arranged on a side of the housing.

4. The voice control device of claim 3, wherein the microphone array is any one of a circular array, a rectangular array, and an elliptical array.

5. The voice control device of claim 4, wherein the microphone array has a predetermined spacing between the microphones.

6. The voice control device according to claim 2, wherein the image capturing means is a camera provided on the housing, and the camera is one or more.

7. The voice control device according to claim 3, further comprising a voice processing module and an image processing module, both of which are connected to the controller.

8. The voice control apparatus according to claim 3, wherein the external device is a cloud server or a local terminal device.

9. An intelligent terminal, characterized in that the intelligent terminal comprises the voice control device of any one of the preceding claims 1-8.