Disclosure of Invention
The invention mainly solves the technical problem of how to automatically adjust the volume and reduce noise pollution in public places.
According to a first aspect, the present invention provides a volume adjustment method based on artificial intelligence, including: acquiring a face image of a user, wherein the face image of the user comprises two ears of the user, and the face image of the user is obtained based on shooting of a front camera of a mobile phone; determining the distance from the mobile phone to the ears of the user by using a distance determination model based on the face image of the user and photographing parameters of the mobile phone when the front camera shoots; determining the minimum volume of the mobile phone by using a minimum volume determining model based on the distance from the mobile phone to the ears of the user and the environmental noise data; acquiring a surrounding image of a user, wherein the surrounding image of the user comprises a plurality of passers-by; determining a plurality of passer-by information using a passer-by information determination model based on the surrounding image of the user; the plurality of passer-by information comprises physiological information of a plurality of passers-by and distances from the plurality of passers-by to the mobile phone; determining the volume bearing degree of the passers-by using a bearing degree determining model based on the physiological information of the passers-by; determining the maximum volume of the mobile phone by using a maximum volume determination model based on the volume bearing degree of the plurality of passers-by, the distance between the plurality of passers-by and the mobile phone and the environmental noise data; and adjusting the mobile phone volume based on the minimum volume of the mobile phone and the maximum volume of the mobile phone.
Still further, the physiological information of the passers-by includes the passers-by's age, sex, height, weight, whether resting or not.
Still further, the adjusting the mobile phone volume based on the minimum volume of the mobile phone and the maximum volume of the mobile phone includes: and setting the volume of the mobile phone between the minimum volume of the mobile phone and the maximum volume of the mobile phone.
Still further, the setting the volume of the mobile phone between the minimum volume of the mobile phone and the maximum volume of the mobile phone includes: adding the minimum volume of the mobile phone and the maximum volume of the mobile phone to obtain added volume, dividing the added volume by 2 to obtain target volume, and setting the mobile phone volume as the target volume.
According to a second aspect, the present invention provides an artificial intelligence based volume adjustment system comprising: the mobile phone comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a face image of a user, the face image of the user comprises two ears of the user, and the face image of the user is obtained based on the shooting of a front camera of the mobile phone; the distance determining module is used for determining the distance from the mobile phone to the ears of the user by using a distance determining model based on the face image of the user and photographing parameters of the front camera of the mobile phone when photographing; the minimum volume determining module is used for determining the minimum volume of the mobile phone by using a minimum volume determining model based on the distance between the mobile phone and the ears of the user and the environmental noise data; the second acquisition module is used for acquiring a surrounding image of a user, wherein the surrounding image of the user comprises a plurality of passers-by; the passer information determining module is used for determining a plurality of passer information by using a passer information determining model based on the surrounding environment image of the user; the plurality of passer-by information comprises physiological information of a plurality of passers-by and distances from the plurality of passers-by to the mobile phone; the bearing degree determining module is used for determining the volume bearing degree of the passers-by using a bearing degree determining model based on the physiological information of the passers-by; the maximum volume determining module is used for determining the maximum volume of the mobile phone by using a maximum volume determining model based on the volume bearing degree of the plurality of passers-by, the distance between the plurality of passers-by and the mobile phone and the environmental noise data; and the adjusting module is used for adjusting the volume of the mobile phone based on the minimum volume of the mobile phone and the maximum volume of the mobile phone.
Still further, the physiological information of the passers-by includes the passers-by's age, sex, height, weight, whether resting or not.
Still further, the adjustment module is further configured to: and setting the volume of the mobile phone between the minimum volume of the mobile phone and the maximum volume of the mobile phone.
Still further, the adjustment module is further configured to: adding the minimum volume of the mobile phone and the maximum volume of the mobile phone to obtain added volume, dividing the added volume by 2 to obtain target volume, and setting the mobile phone volume as the target volume.
According to a third aspect, the present invention provides an electronic device comprising: a memory; a processor; a computer program; wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method described above.
According to a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a program executable by a processor to implement a method as in any of the above aspects.
The invention provides a volume adjustment method, a system, equipment and a medium based on artificial intelligence, wherein the method comprises the steps of determining the distance from a mobile phone to ears of a user by using a distance determination model based on face images of the user and photographing parameters of a front camera of the mobile phone during photographing; determining a minimum volume of the mobile phone by using a minimum volume determination model based on the distance from the mobile phone to the ears of the user and the environmental noise data; determining a plurality of passer-by information using a passer-by information determination model based on a surrounding image of the user; determining the volume bearing degree of the plurality of passers-by using the bearing degree determining model based on the physiological information of the plurality of passers-by; determining the maximum volume of the mobile phone by using a maximum volume determining model based on the volume bearing degree of the plurality of passers-by, the distance between the plurality of passers-by and the mobile phone and the environmental noise data; and adjusting the volume of the mobile phone based on the minimum volume of the mobile phone and the maximum volume of the mobile phone. The method can automatically adjust the volume and reduce noise pollution in public places.
Detailed Description
The invention will be described in further detail below with reference to the drawings by means of specific embodiments.
In an embodiment of the present invention, there is provided a volume adjustment method based on artificial intelligence as shown in fig. 1, where the volume adjustment method based on artificial intelligence includes steps S1 to S8:
step S1, a face image of a user is obtained, wherein the face image of the user comprises ears of the user, and the face image of the user is obtained based on shooting by a front camera of a mobile phone.
The face image of the user is obtained by shooting through a front camera of the mobile phone. The human face image of the user comprises ears of the user, and the position, the outline, the shape and the like of the ears of the user can be clearly displayed in the human face image of the user.
And S2, determining the distance from the mobile phone to the ears of the user by using a distance determination model based on the face image of the user and photographing parameters of the front camera of the mobile phone during photographing.
The photographing parameters of the front camera of the mobile phone during photographing comprise lens multiple, lens angle, resolution and the like of the front camera.
Because the front-facing cameras of the mobile phone take pictures of the user at different distances, the obtained face images of the user are different, and therefore the distance from the mobile phone to the ears of the user during shooting can be processed and determined based on the face images of the user and shooting parameters of the front-facing cameras of the mobile phone during shooting. In some embodiments, the distance of the handset to the user's ears may be determined by a distance determination model.
The distance from the mobile phone to the ears of the user may be the average distance from the mobile phone to the ears, for example, the distance from the mobile phone to the left ear is 1 meter, and the distance from the mobile phone to the right ear is 0.9 meter, and then the average distance from the mobile phone to the ears is 1 meter.
The distance determination model is a convolutional neural network model, which includes a convolutional neural network. Convolutional neural network models are one implementation of artificial intelligence. The Convolutional Neural Network (CNN) may be a multi-layer neural network (e.g., comprising at least two layers). The at least two layers may include at least one of a convolutional layer (CONV), a modified linear unit (ReLU) layer, a pooling layer (POOL), or a fully-connected layer (FC). At least two layers of the Convolutional Neural Network (CNN) may correspond to neurons arranged in three dimensions: width, height, depth. In some embodiments, a Convolutional Neural Network (CNN) may have an architecture of [ input layer-convolutional layer-modified linear cell layer-pooling layer-full-connection layer ]. The convolution layer may calculate the output of neurons connected to a local region in the input, calculate the dot product between the weight of each neuron and its small region connected in the input volume. The input of the distance determination model is the face image of the user and photographing parameters when the front camera of the mobile phone photographs, and the output of the distance determination model is the distance from the mobile phone to the ears of the user.
The distance determination model can be obtained through training through a training sample, the training sample comprises sample input data and labels corresponding to the sample input data, sample input in the training sample comprises face images of sample users, sample photographing parameters of a front camera of the mobile phone during photographing are obtained, and sample output labels in the training sample are sample distances from the mobile phone to ears of the users. The sample output label of the training sample of the distance determination model can be obtained through manual labeling by a worker, for example, the face image of a sample user is manually labeled by sample photographing parameters when a front camera of the mobile phone photographs, and the sample distance from the mobile phone to the ears of the user is labeled. And finally, training an initial distance determination model based on the plurality of training samples to obtain the distance determination model. In some embodiments, the initial distance determination model may be trained by a gradient descent method to obtain a trained distance determination model.
And step S3, determining the minimum volume of the mobile phone by using a minimum volume determination model based on the distance between the mobile phone and the ears of the user and the environmental noise data.
The ambient noise data includes the volume level, volume frequency, etc. of the ambient noise.
The minimum volume of the mobile phone represents the minimum volume that can be guaranteed to be heard by the user. If the volume of the mobile phone is set to be smaller than the minimum volume of the mobile phone, the user cannot hear the sound of the mobile phone. For example, the minimum volume of the mobile phone is 40 db, and if the volume setting of the mobile phone is less than 40 db, the user cannot hear the sound of the mobile phone.
The minimum volume determination model is a deep neural network model, which includes a deep neural network (Deep Neural Networks, DNN). The deep neural network model is one implementation of artificial intelligence. The deep neural network may include a plurality of processing layers, each processing layer being composed of a plurality of neurons, each neuron matrixing data. The parameters used by the matrix may be obtained by training. The deep neural network may include a recurrent neural network (Recurrent Neural Network, RNN), a convolutional neural network (Convolutional Neural Networks, CNN), a generating countermeasure network (Generative Adversarial Networks, GAN), and so on. The input of the minimum volume determining model is the distance between the mobile phone and the ears of the user and the environmental noise data, and the output of the minimum volume determining model is the minimum volume of the mobile phone.
In some embodiments, the minimum volume determination model may include a sound propagation loss determination model and a fusion model. The sound propagation loss determination model and the fusion model are both deep neural network models.
The input of the sound propagation loss determination model is air information, and the output of the sound propagation loss determination model is sound propagation loss degree. The air information includes component ratio information, air density information, temperature information, and humidity information of various gases of the air. The sound propagation loss degree indicates the degree of sound propagation loss in air, and the greater the sound propagation loss degree, the greater the sound propagation volume loss in air, and vice versa. The sound propagation loss is mainly affected by the air itself, so that the sound propagation loss can be determined by processing the air information by the sound propagation loss determination model.
The input of the fusion model comprises the sound propagation loss degree, the distance between the mobile phone and the ears of the user and the environmental noise data, the output of the fusion model is the minimum volume of the mobile phone, and the fusion model can comprehensively consider the sound propagation loss degree, the distance between the mobile phone and the ears of the user and the environmental noise data and finally output the minimum volume of the mobile phone.
And S4, acquiring a surrounding image of the user, wherein the surrounding image of the user comprises a plurality of passers-by.
The user's surroundings image may be used to display the user's surroundings information. In some embodiments, the image of the user's surroundings may be taken of the environment by the user's rear camera. In some embodiments, the image of the surrounding environment of the user may be captured by a camera in the environment, and the image of the surrounding environment of the user captured by the camera may be sent to the mobile phone of the user.
Step S5, determining a plurality of pieces of passer information by using a passer information determination model based on the surrounding environment image of the user; the plurality of passer-by information comprises physiological information of a plurality of passers-by and distances between the plurality of passers-by and the mobile phone.
The plurality of passer-by information comprises physiological information of a plurality of passers-by and distances between the plurality of passers-by and the mobile phone.
The physiological information of the passers-by indicates the age, sex, height, weight, whether or not the passer-by is resting, etc.
The distance from the passer-by to the mobile phone may be a straight line distance from the passer-by to the user mobile phone.
The surrounding image of the user can display a plurality of pieces of passerby information, and the surrounding image of the user can be processed to obtain a plurality of pieces of passerby information.
The passer-by information determining model is a convolutional neural network model, and the convolutional neural network model comprises a convolutional neural network. Convolutional neural network models are one implementation of artificial intelligence. The Convolutional Neural Network (CNN) may be a multi-layer neural network (e.g., comprising at least two layers). The at least two layers may include at least one of a convolutional layer (CONV), a modified linear unit (ReLU) layer, a pooling layer (POOL), or a fully-connected layer (FC). At least two layers of the Convolutional Neural Network (CNN) may correspond to neurons arranged in three dimensions: width, height, depth. In some embodiments, a Convolutional Neural Network (CNN) may have an architecture of [ input layer-convolutional layer-modified linear cell layer-pooling layer-full-connection layer ]. The convolution layer may calculate the output of neurons connected to a local region in the input, calculate the dot product between the weight of each neuron and its small region connected in the input volume. The input of the passer information determination model is an image of the surrounding environment of the user, and the output of the passer information determination model is a plurality of pieces of passer information.
And S6, determining the volume bearing degree of the passers-by using a bearing degree determining model based on the physiological information of the passers-by.
The sound volume bearing degree of the passersby indicates the degree to which the passersby can adapt to the sound volume of the mobile phone when hearing the sound of the mobile phone. The higher the volume bearing degree of the passersby is, the larger the volume can be adapted to, and the larger the volume of the mobile phone of the user can be. The lower the volume bearing degree of the passersby is, the larger the volume of the passersby cannot adapt to the mobile phone, and the smaller the volume of the mobile phone of the user needs to be.
The volume bearing degree of the passersby can be a numerical value between 0 and 1, and the larger the numerical value is, the higher the volume bearing degree of the passersby is, and the mobile phone sound with larger volume can be adapted. The sound volume bearing degree of the passersby is determined by the physiological information of the passersby, for example, when the passersby is an infant with a small age, the sound volume bearing degree is low, and crying can be generated when the mobile phone sounds. For example, by way of an older adult, the hearing of the aged person decreases, and the response to hearing the sound of the mobile phone is small, so that the sound volume is received to a high degree. For another example, when a passer-by takes a rest, a restless emotion is generated after hearing the sound of the mobile phone, and the volume of the passer-by taking the rest is born to a lower degree, and vice versa.
The tolerance level determination model is a deep neural network model comprising a deep neural network (Deep Neural Networks, DNN). The deep neural network model is one implementation of artificial intelligence. The deep neural network may include a plurality of processing layers, each processing layer being composed of a plurality of neurons, each neuron matrixing data. The parameters used by the matrix may be obtained by training. The deep neural network may include a recurrent neural network (Recurrent Neural Network, RNN), a convolutional neural network (Convolutional Neural Networks, CNN), a generating countermeasure network (Generative Adversarial Networks, GAN), and so on. The input of the bearing degree determining model is the information of the plurality of passers-by, and the output of the bearing degree determining model is the volume bearing degree of the plurality of passers-by.
The bearing degree determining model can be obtained through training through a training sample, the training sample comprises sample input data and labels corresponding to the sample input data, sample input in the training sample comprises a plurality of sample passerby information, sample output labels in the training sample are the volume bearing degree of the plurality of sample passerby information, and sample output labels in the training sample can be obtained through manual labeling.
And S7, determining the maximum volume of the mobile phone by using a maximum volume determination model based on the volume bearing degree of the passers-by, the distances between the passers-by and the mobile phone and the environmental noise data.
The maximum volume determination model is a deep neural network model, which includes a deep neural network (Deep Neural Networks, DNN). The deep neural network model is one implementation of artificial intelligence. The deep neural network may include a plurality of processing layers, each processing layer being composed of a plurality of neurons, each neuron matrixing data. The parameters used by the matrix may be obtained by training. The deep neural network may include a recurrent neural network (Recurrent Neural Network, RNN), a convolutional neural network (Convolutional Neural Networks, CNN), a generating countermeasure network (Generative Adversarial Networks, GAN), and so on. The input of the maximum volume determining model is the volume bearing degree of the passers-by and the distance between the passers-by and the mobile phone, and the output of the maximum volume determining model is the maximum volume of the mobile phone.
The maximum volume determining model can comprehensively consider the volume bearing degree of the passers-by, the distance between the passers-by and the mobile phone and the environmental noise data to determine the maximum volume of the mobile phone.
The maximum volume of the mobile phone represents the maximum volume which has the smallest influence on a plurality of surrounding passers-by and is determined by comprehensively considering the volume bearing degree of a plurality of passers-by and the distance between the plurality of passers-by and the mobile phone. If the volume set by the mobile phone exceeds the maximum volume of the mobile phone, discomfort may be caused to a plurality of passers-by around the user.
And S8, adjusting the volume of the mobile phone based on the minimum volume of the mobile phone and the maximum volume of the mobile phone.
In some embodiments, the volume of the mobile phone can be set between the minimum volume of the mobile phone and the maximum volume of the mobile phone, so that the user can hear the sound in the mobile phone, and the influence on a plurality of surrounding passers-by is reduced.
In some embodiments, a mobile phone volume of the user may be randomly selected from a minimum volume of the mobile phone and a maximum volume of the mobile phone.
In some embodiments, the minimum volume of the mobile phone and the maximum volume of the mobile phone may be added to obtain an added volume, the added volume is divided by 2 to obtain a target volume, and the mobile phone volume is set as the target volume. As an example, if the minimum volume of the mobile phone is 40 and the maximum volume of the mobile phone is 100, then (40+100)/(2=70), the mobile phone volume may be set to 70 db.
Based on the same inventive concept, fig. 2 is a schematic diagram of an artificial intelligence-based volume adjustment system according to an embodiment of the present invention, where the artificial intelligence-based volume adjustment system includes:
a first obtaining module 21, configured to obtain a face image of a user, where the face image of the user includes two ears of the user, and the face image of the user is obtained based on a front camera of a mobile phone;
the distance determining module 22 is configured to determine a distance from the mobile phone to the ears of the user using a distance determining model based on the face image of the user and photographing parameters of the front camera of the mobile phone when photographing;
a minimum volume determining module 23, configured to determine a minimum volume of the mobile phone using a minimum volume determining model based on the distance between the mobile phone and the ears of the user and the environmental noise data;
a second obtaining module 24, configured to obtain an image of a surrounding environment of a user, where the image of the surrounding environment of the user includes a plurality of passers-by;
a passer information determination module 25 for determining a plurality of passer information using a passer information determination model based on an image of the surrounding environment of the user; the plurality of passer-by information comprises physiological information of a plurality of passers-by and distances from the plurality of passers-by to the mobile phone;
a bearing degree determining module 26 for determining the volume bearing degree of the plurality of passers-by using a bearing degree determining model based on the physiological information of the plurality of passers-by;
a maximum volume determining module 27, configured to determine a maximum volume of the mobile phone using a maximum volume determining model based on the volume bearing degrees of the plurality of passers-by, the distances between the plurality of passers-by and the mobile phone, and the environmental noise data;
and the adjusting module 28 is used for adjusting the volume of the mobile phone based on the minimum volume of the mobile phone and the maximum volume of the mobile phone.
Based on the same inventive concept, an embodiment of the present invention provides an electronic device, as shown in fig. 3, including:
a processor 31; a memory 32; a computer program; wherein the computer program is stored in the memory 32 and configured to be executed by the processor 31 to implement the artificial intelligence based volume adjustment method, the method comprising: acquiring a face image of a user, wherein the face image of the user comprises two ears of the user, and the face image of the user is obtained based on shooting of a front camera of a mobile phone; determining the distance from the mobile phone to the ears of the user by using a distance determination model based on the face image of the user and photographing parameters of the mobile phone when the front camera shoots; determining the minimum volume of the mobile phone by using a minimum volume determining model based on the distance from the mobile phone to the ears of the user and the environmental noise data; acquiring a surrounding image of a user, wherein the surrounding image of the user comprises a plurality of passers-by; determining a plurality of passer-by information using a passer-by information determination model based on the surrounding image of the user; the plurality of passer-by information comprises physiological information of a plurality of passers-by and distances from the plurality of passers-by to the mobile phone; determining the volume bearing degree of the passers-by using a bearing degree determining model based on the physiological information of the passers-by; determining the maximum volume of the mobile phone by using a maximum volume determination model based on the volume bearing degree of the plurality of passers-by, the distance between the plurality of passers-by and the mobile phone and the environmental noise data; and adjusting the mobile phone volume based on the minimum volume of the mobile phone and the maximum volume of the mobile phone.
Based on the same inventive concept, the present embodiment provides a computer readable storage medium having a program stored thereon, the program being executable by a processor 31 to implement the artificial intelligence based volume adjustment method provided as described above, the method comprising acquiring a face image of a user, the face image of the user including both ears of the user, the face image of the user being captured based on a front camera of a mobile phone; determining the distance from the mobile phone to the ears of the user by using a distance determination model based on the face image of the user and photographing parameters of the mobile phone when the front camera shoots; determining the minimum volume of the mobile phone by using a minimum volume determining model based on the distance from the mobile phone to the ears of the user and the environmental noise data; acquiring a surrounding image of a user, wherein the surrounding image of the user comprises a plurality of passers-by; determining a plurality of passer-by information using a passer-by information determination model based on the surrounding image of the user; the plurality of passer-by information comprises physiological information of a plurality of passers-by and distances from the plurality of passers-by to the mobile phone; determining the volume bearing degree of the passers-by using a bearing degree determining model based on the physiological information of the passers-by; determining the maximum volume of the mobile phone by using a maximum volume determination model based on the volume bearing degree of the plurality of passers-by, the distance between the plurality of passers-by and the mobile phone and the environmental noise data; and adjusting the mobile phone volume based on the minimum volume of the mobile phone and the maximum volume of the mobile phone.
Finally, it should be understood that the embodiments described in this specification are merely illustrative of the principles of the embodiments of this specification. Other variations are possible within the scope of this description. Thus, by way of example, and not limitation, alternative configurations of embodiments of the present specification may be considered as consistent with the teachings of the present specification. Accordingly, the embodiments of the present specification are not limited to only the embodiments explicitly described and depicted in the present specification.