CN114157905A

CN114157905A - Television sound adjusting method and device based on image recognition and television

Info

Publication number: CN114157905A
Application number: CN202111382511.0A
Authority: CN
Inventors: 苏运全; 李蛟龙
Original assignee: Shenzhen Konka Electronic Technology Co Ltd
Current assignee: Shenzhen Konka Electronic Technology Co Ltd
Priority date: 2021-11-22
Filing date: 2021-11-22
Publication date: 2022-03-08
Anticipated expiration: 2041-11-22
Also published as: CN114157905B

Abstract

The embodiment of the invention provides a television sound adjusting method, a device and a television based on image recognition, wherein the method comprises the following steps: acquiring an image of a user watching a television at present, and acquiring user information based on the image; determining a target sound parameter of a sound system of the television according to the user information; and adjusting the corresponding sound parameters of the sound system according to the target sound parameters. The invention can automatically adjust the sound parameters of the television in real time according to the user information of the user watching the television at present without manual adjustment of the user, thereby simplifying the operation of the user and improving the use experience of the user.

Description

Television sound adjusting method and device based on image recognition and television

Technical Field

The invention relates to the field of televisions, in particular to a television sound adjusting method and device based on image recognition and a television.

Background

Television is one of the important devices for people's living room entertainment, which can be used by users of different ages in the home by offering different programs.

However, since users of different ages have different hearing conditions, they have different requirements on the sound system of the television, and therefore the sound box system of the television needs to be used by users of different ages.

The conventional television manufacturers generally adjust three to five sound modes for the listening performance of the whole sound system according to the frequency response performance and arrangement of the built-in speakers, and fix the sound modes in a television menu, so that users can actively switch the sound modes. Therefore, inconvenience is brought to users, for example, before operation, the users need to understand the sound modes, but some old people or users in the young age stage may not have correct cognition on the sound modes, and once the people watching the television are replaced, the users need to manually adjust the sound modes, so that the operation is complicated, and the user experience is poor.

Disclosure of Invention

In view of the above, the present invention provides a method and an apparatus for adjusting a television sound based on image recognition, and a television set, so as to improve the above problem.

The embodiment of the invention provides a television sound adjusting method based on image recognition, which comprises the following steps:

acquiring an image of a user watching a television at present, and acquiring user information based on the image;

determining a target sound parameter of a sound system of the television according to the user information;

and adjusting the corresponding sound parameters of the sound system according to the target sound parameters.

Preferably, the user information includes a total number of users and location information of each user; the position information is expressed by a coordinate, the origin of a coordinate system where the coordinate is located is the central point of the image, the transverse axis of the coordinate system is along the width direction of the image, and the longitudinal axis of the coordinate system is along the height direction of the image;

determining a target sound parameter of a sound system of the television according to the user information, specifically comprising:

determining a target sound channel balance value of a sound system of the television according to the position information of each user and the adjusting range of the sound channel balance value of the sound system;

and determining the left channel gain and the right channel gain of the sound system according to the target channel balance value.

Preferably, the target channel balance value S is calculated by the following formula:

wherein:

n is the total number of users, and W is half of the resolution width of the image; x is the number of₁，x₂…x_NThe abscissa value is the coordinate of each user.

Preferably, each target channel balance value corresponds to a set of left channel gain and right channel gain, and when the target channel balance value is greater than 0, the right channel gain is greater than the left channel gain, and when the target channel balance value is less than 0, the right channel gain is less than the left channel gain.

Preferably, the user information further includes an age group of each user;

determining a target sound parameter of a sound system of the television according to the user information, further comprising:

acquiring the proportion of each age group according to the age group of each user;

the gains at high, mid and low frequencies of the sound system are determined according to the proportions of the respective age groups.

Preferably, the age groups include the elderly, the middle-aged and young, and the children;

determining the gains of the sound system at high pitch, mid pitch and low pitch according to the proportions of the age groups, specifically comprising:

determining the current dominant age bracket according to the proportion of each age bracket;

acquiring an adjusting value corresponding to the age group in the leading age group;

the gain of the sound system at high pitch, mid pitch and low pitch is determined based on the adjustment value.

Preferably, when the age group in the dominance is the old age group, the high pitch gain is greater than the middle pitch gain and greater than the low pitch gain;

when the age group is a child group, the treble gain is smaller than the middle gain and smaller than the bass gain.

The embodiment of the invention also provides a television sound adjusting device based on image recognition, which comprises:

the system comprises a user information acquisition unit, a display unit and a display unit, wherein the user information acquisition unit is used for acquiring an image of a user watching the television at present and acquiring user information based on the image;

the target sound parameter determining unit is used for determining a sound system target sound parameter of the television according to the user information;

and the adjusting unit is used for adjusting the corresponding sound parameters of the sound system according to the target sound parameters.

An embodiment of the present invention further provides a television, which includes:

the image shooting module captures an image of a user watching a television;

a processor configured to:

acquiring user information based on the image;

determining a left channel gain and a right channel gain of the sound system according to the target channel balance value; wherein, the calculation formula of the target sound channel balance value S is as follows:

wherein:

In the above embodiment, after the user information is obtained through the image of the user in front of the television, the target sound parameter corresponding to the user currently watching the television is determined according to the user information, and the sound system of the television is automatically adjusted according to the target sound parameter, so that the adjusted sound parameter can better meet the requirements of the user currently watching the television or the requirements of most users currently watching the television. Because the embodiment continuously acquires the user information, the sound parameters can be automatically adjusted in real time according to the change of the number, the position and the like of the users watching the television at present without manual adjustment of the users, so that the operation of the users can be simplified, and the use experience of the users is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a flowchart illustrating a television sound adjusting method based on image recognition according to a first embodiment of the present invention.

Fig. 2 is a schematic diagram of an image captured by a camera according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a television sound adjusting apparatus based on image recognition according to a second embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For better understanding of the technical solutions of the present invention, the following detailed descriptions of the embodiments of the present invention are provided with reference to the accompanying drawings.

It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The invention is described in further detail below with reference to the following detailed description and accompanying drawings:

referring to fig. 1, a first embodiment of the present invention provides a method for adjusting television sound based on image recognition, which includes the following steps:

s101, obtaining an image of a user watching the television at present, and obtaining user information based on the image.

In this embodiment, an image of a user currently watching a television may be captured by a camera module (e.g., a camera), where the camera module may be a camera module built in the television or an external camera module, and this embodiment is not limited in particular.

In the embodiment, the camera module is responsible for detecting and shooting pictures in the visual field range of the camera module in real time and generating corresponding images so as to obtain user information of the currently watching television set based on the images.

The user information obtained through the image can be processed by a television or a camera module.

When processed by the television, the images captured by the camera may be transmitted to the television via the uvc protocol, and the television may then obtain user information by executing a corresponding image recognition algorithm.

Among them, uvc is called USB Video Class, namely: USB video class, a protocol standard defined for USB video capture devices. Is a protocol standard defined by Microsoft and other equipment manufacturers for USB video capture equipment, and has become one of the USB org standards.

When the image is processed by the camera module, after the camera module shoots an image, the image is operated by a built-in algorithm of the camera module, user information is obtained, and then the user information is transmitted to the television for reprocessing.

Of course, a part of the data may be processed by the television and a part of the data may be processed by the camera, and the present invention is not particularly limited. In addition, when processing the image, it is possible to process each frame of image, or perform the processing once every predetermined time period, and these schemes are within the scope of the present invention.

In this embodiment, the user information may include the total number of users, the location information, the gender information, the age, and the like of the users, which may be implemented by the existing image processing algorithm, and the details of the present invention are not described herein.

S102, determining a target sound parameter of a sound system of the television according to the user information.

S103, adjusting the corresponding sound parameters of the sound system according to the target sound parameters.

In this embodiment, the sound system may be a sound system embedded in the television, or may be a sound system externally connected to the television, and the present invention is not limited specifically.

In this embodiment, the sound parameters include, for example, volume, left channel gain, right channel gain, treble gain, middle gain, bass gain, etc., which are determined by the actual situation, and the present invention is not limited in particular.

In this embodiment, after obtaining the user information, the television determines a target sound parameter corresponding to a user currently watching the television according to the user information, and automatically adjusts a sound system of the television according to the target sound parameter, so that the adjusted sound parameter can better meet the requirements of the user currently watching the television or the requirements of most users currently watching the television. Because the embodiment continuously acquires the user information, the sound parameters can be automatically adjusted in real time according to the change of the number, the position and the like of the users watching the television at present without manual adjustment of the users, so that the operation of the users can be simplified, and the use experience of the users is improved.

In order to facilitate an understanding of the invention, some preferred embodiments of the invention are described further below.

In a preferred embodiment, the user information includes a total number of users and location information of each user; step S102 specifically includes:

s1021, determining a target sound channel balance value of the television according to the position information of each user and the adjusting range of the sound channel balance value of the television;

and S1022, determining the left channel gain and the right channel gain of the television according to the target channel balance value.

In this embodiment, the total number of users may be implemented by a face recognition algorithm, and in order to obtain the position information of each user, as shown in fig. 2, a coordinate system is established in the image with a central point of the image as an origin, a horizontal axis of the coordinate system is along a width direction of the image (positive towards the right and negative towards the left), and a vertical axis of the coordinate system is along a height direction of the image (positive towards the right and negative towards the bottom).

As shown in fig. 2, assuming that the resolution of the image is 1080 × 720(1080 is the width in the width direction, which indicates 1080 pixels in the width direction, 720 is the height in the height direction, which indicates 720 pixels in the height direction), it can be known from the image that A, B, C three users are included on the image, and the position information of user a is (-500,180), the position information of user B is (-280,180), and the position information of user C is (280, -60), wherein the position information of the user can be represented by a specified reference point of the head, for example, the specified reference point can be selected as the midpoint of the connecting line of two eyes, depending on the actual situation.

As can be seen from fig. 2, there are two people in the three people, and one person is in the right of the tv, so that the direction of sound can be adjusted by adjusting the channel balance in order to make most users get better sound experience.

Specifically, the channel balance S represents the difference between the gains of the left and right channels in the stereophonic broadcasting system, and if the imbalance is too large, the sound image localization of the played stereophonic sound will be shifted. The stereo balance of a typical high quality sound system should be less than 1 dB. The present embodiment utilizes the deviation generated by the difference to shift the sound to the position of more users.

In this embodiment, the adjustment range (-s, s) of the channel balance value of the sound system is related to the actual capability of the sound system, and if the range is too large, problems such as sound break may occur. For example, s may take 50.

The channel balance S can be calculated by the following algorithm:

wherein: n is the total number of users, and W is half of the resolution width of the image; x is the number of₁，x₂…x_NThe abscissa value is the coordinate of each user. Each target sound channel balance value corresponds to a group of left sound channel gain and right sound channel gain, when the target sound channel balance value is larger than 0, the right sound channel gain is larger than the left sound channel gain, and when the target sound channel balance value is smaller than 0, the right sound channel gain is smaller than the left sound channel gain.

From the values of fig. 2, the current channel balance S can be calculated to be-15.43. S being negative represents that the user on the left side is dominant and therefore the sound can be properly shifted to the left, for example, the left channel gain can be set to 15db, and the right channel gain is set to 0db, so that after the final sound superposition, the sound will be shifted to the left and the user will perceive the sound as coming out from the left. The specific channel balance S and the gain values of the left and right channels are determined by the specific power amplification capability of the television and the speaker material, and are not described herein again.

In summary, in this embodiment, the channel balance degree is determined according to the position information and the total number of users, and then the gains of the left and right channels are determined according to the channel balance degree to adjust the propagation direction of the sound system, so that the sound is shifted to positions where more users are located, and the user experience degree is improved.

Preferably, the user information further includes an age group of each user;

step S102 further comprises:

s1023, acquiring the proportion of each age group according to the age group of each user;

and S1024, determining the gains of the television in high pitch, middle pitch and low pitch according to the proportion of each age group.

In this embodiment, the gains of the high, middle and low frequencies of the high-quality audio system are relatively balanced, and the high, middle and low frequencies are not adjusted too much for young and middle years, but are limited by the age of the user, and the subjective feeling of the balance is uneven, especially for the elderly and children. The elderly are insensitive to high pitch and even some people cannot hear the sound of 12kHz-20 kHz. And vice versa for children. It is necessary to adjust the sound pattern according to the user information.

In this embodiment, for example, the age group can be divided into three stages, namely, an aged stage, a middle-aged stage and a young-aged stage. Step S1024 specifically includes:

firstly, according to the proportion of each age group, the current dominant age group is determined.

TABLE 1

z	Children's toy	Middle-aged	Old people
				N(z)/N	h	j	k

As shown in Table 1, z represents the age of the identified user, and N (z)/N represents the proportion of users in each age.

And secondly, acquiring an adjusting value corresponding to the age group in the leading age group.

TABLE 2

As shown in table 2, according to the proportion of each age group, the present embodiment may determine the age group in the dominant position, and acquire the adjustment value (i.e., G value) corresponding to the age group in the dominant position.

The dominant position here may be only the age group with the highest proportion, or may require that the proportion is greater than the sum of the proportions of other age groups, which is determined by the actual situation, and the present invention is not particularly limited.

It is also understood that the adjustment value can be set according to actual needs, and the above-mentioned-10, 0, 10 is only an example and should not be construed as limiting the present invention.

Finally, the gains of the television at high pitch, middle pitch and low pitch are determined according to the adjusting value.

In the present embodiment, the gain of a set of high, middle and low tones corresponding to each adjustment value G is (fa (G), fb (G), fc (G)). Examples are: when the calculated G is +10, the gains of the high, middle and low voice channels can be set to be +3dB,1dB and-2 dB, and then after the final voice superposition, the middle and high voice is relatively improved, the effect is bright, and the voice mode is suitable for the watching of the aged users. When the calculated G is-10, the gains of the high, middle and low sound channels can be set to be-2 dB,1dB and +3dB, so that the bass is relatively improved after the final sound superposition, and the method is suitable for watching the shadow by children users. When the calculated G is 0, the high, medium and low gains of the sound channels can be set to be the same (if all are 0), and then a relatively balanced sound effect can be achieved.

It should be noted that the preset adjustment value G is not limited to a fixed value, and may also be a function value with N (z)/N as a factor, which is not described herein.

It should be noted that the relationship between the adjustment value G and the high, medium, and low sound gains may be a preset mapping relationship, or may be a functional relationship, and the preset function is not limited to a linear function, or may be other functions, which is not described herein in detail.

In order to facilitate an understanding of the invention, the following description will be given by way of a practical example of an embodiment of the invention.

Suppose that 3 people are just being located the angle of parallel and level TV water flat line and see the shadow, 2 old people and 1 children respectively, 2 old people are located screen axis 2m on the right side, and children are located screen axis 1m on the left side. Can gather 3 people's user information and packing data through the camera, its key data format is: (3, n)₁(2,0, senior stage), n₂(2,0, senior stage), n₃(-1,0, Children)Segment). Wherein 3 represents the total number of users, n₁(2,0, old person) means that the user information of the first user is the coordinate (2, 0), and the age group is the old person.

Calculating S to be 1W by a formula of the channel balance degree; w is half of the image resolution width, and at the moment, most users are known to be positioned at the right side of the television, so that the sound mode adjusting module increases the gain of a right channel and decreases the gain of a left sound channel, larger sound is concentrated at the right side of a plurality of people, and the specific size of the right sound channel larger than that of the left sound channel can be preset in advance. According to the age group information of the user, h is 1/3, k is 0, and j is 2/3; j > -h + k; the value of G is +10, so the high, mid, and bass gains are adjusted to: FA (g), FB (g) and FC (g) correspond to +3dB,1dB and-2 dB, so that the effect of the middle and high sound segment is improved, and the sensitive deficiency of the middle and high sound of the ears of the old is made up.

Referring to fig. 3, a second embodiment of the present invention further provides a television sound adjusting apparatus based on image recognition, which includes:

a user information obtaining unit 210, configured to obtain user information of a currently watching television through an image captured by a camera;

a target sound parameter determining unit 220, configured to determine a target sound parameter of the television according to the user information;

the adjusting unit 230 is configured to adjust the corresponding sound parameter of the television according to the target sound parameter, so that the adjusted sound parameter can meet the hearing requirements of most users watching the television currently.

The third embodiment of the present invention also provides a television set, which includes:

the camera module captures an image of a user watching a television;

a processor coupled to the camera module and configured to:

acquiring user information based on the image;

wherein:

The fourth embodiment of the present invention further provides a computer-readable storage medium, which stores a computer program, where the computer program can be executed by a processor of a device in which the computer program is stored, so as to implement the television sound adjustment method based on image recognition as described above.

In the embodiments provided in the present invention, it should be understood that the disclosed method can be implemented in other ways. The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A television sound adjusting method based on image recognition is characterized by comprising the following steps:

2. The method of claim 1, wherein the user information comprises a total number of users and location information of each user; the position information is expressed by a coordinate, the origin of a coordinate system where the coordinate is located is the central point of the image, the transverse axis of the coordinate system is along the width direction of the image, and the longitudinal axis of the coordinate system is along the height direction of the image;

3. The method of claim 2, wherein the target channel balance value S is calculated by the following formula:

wherein:

4. The method of claim 3, wherein each target channel balance value corresponds to a set of left channel gain and right channel gain, and wherein the right channel gain is greater than the left channel gain when the target channel balance value is greater than 0, and the right channel gain is less than the left channel gain when the target channel balance value is less than 0.

5. The method of claim 2, wherein the user information further comprises an age group of each user;

6. The method of claim 5, wherein the age groups include an elderly age group, a young and middle age group, and a children age group;

7. The image recognition-based television sound adjustment method of claim 6,

when the age group is the leading age group, the high pitch gain is larger than the middle pitch gain and larger than the low pitch gain;

8. An apparatus for adjusting sound of a television based on image recognition, comprising:

9. A television set, comprising:

the camera module captures an image of a user watching a television;

a processor coupled to the camera module and configured to:

acquiring user information based on the image;

10. The television set according to claim 9, wherein the user information includes a total number of users and location information of each user; the position information is expressed by a coordinate, the origin of a coordinate system where the coordinate is located is the central point of the image, the transverse axis of the coordinate system is along the width direction of the image, and the longitudinal axis of the coordinate system is along the height direction of the image;

determining a left channel gain and a right channel gain of the sound system according to the target channel balance value;

wherein, the calculation formula of the target sound channel balance value S is as follows:

wherein: