CN113411513B

CN113411513B - Intelligent light adjusting method and device based on display terminal and storage medium

Info

Publication number: CN113411513B
Application number: CN202110957607.9A
Authority: CN
Inventors: 柴剑平; 赵薇
Original assignee: Communication University of China
Current assignee: Communication University of China
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2021-11-26
Anticipated expiration: 2041-08-20
Also published as: CN113411513A

Abstract

The invention discloses an intelligent light adjusting method, an intelligent light adjusting device and a storage medium based on a display terminal, wherein the method comprises the following steps: shooting a face image of a person through a main camera and an auxiliary camera of a display terminal; the method comprises the steps that a fusion image is obtained by repeatedly shooting a face image of a person and is input into a first neural network model, the first neural network model compares the fusion image with a standard image until the total loss of a loss function is smaller than or equal to a loss threshold value, and a first lamp group control instruction is output to a lamp group at the edge of a display terminal, wherein the standard image corresponds to values of brightness and chromaticity of lamp beads of each lamp group; acquiring background music, inputting the background music into a second neural network model, obtaining music style emotion information according to music beat classification, and outputting a second lamp group control instruction to each lamp group according to different music style emotion information; and after the background music stops, switching the second lamp group control instruction into the first lamp group control instruction.

Description

Intelligent light adjusting method and device based on display terminal and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an intelligent light adjusting method and device based on a display terminal and a storage medium.

Background

With the rapid development of internet technology, the education platform industry is growing up rapidly. No matter the construction of admiring class, or the live broadcast of teaching, mostly need the teacher to go out of the mirror, give lessons to the computer. The appearance of the teacher, especially the face, can be recorded by the computer camera in a short distance. Generally, the environment for teachers to give lessons is simple, and the images of the teachers are collected through a notebook computer in offices or at home for daily live broadcast teaching.

The notebook computer has the greatest advantage of convenient use, does not need to consider any driver to be installed, does not need to insert any hardware, and can be conveniently connected with the camera when the computer is started. However, under special environments such as low illumination and too close distance between a person and a camera, the quality of the acquired video image is poor, and the problems of blurred image, more noise points, dim image color, geometric distortion of the image, poor visual stereo effect and the like exist. On one hand, teachers at the front line of teaching are embarrassed about dark skin, deformed faces and poor in experience feeling, and influence normal performance of teaching level due to insufficient confidence of the lens; on the other hand, students who are the main subjects of teaching cannot see the body movements and expressions of teachers clearly, and the transfer of knowledge is also hindered, thereby affecting the learning effect.

In order to bring the sensory enjoyment to teachers and students, the arrangement of light and cameras on the teaching and broadcasting site plays a very important role in teaching effects. However, although the existing notebook computer is greatly improved in the internal performance and appearance of software and hardware, including a main board, a display card, a CPU and a hard disk, the improvement on the camera and the light is not great, and a camera is still placed along the central position above the screen of the notebook computer. Although an LED light supplement lamp is arranged near a camera of a part of notebook computers, the brightness and the chromaticity can not be adjusted, and the effect of illuminating the front of a screen can be realized, so that the experience of a speaker and audiences needs to be improved.

Disclosure of Invention

In order to solve the problems, the invention discloses an intelligent light adjusting method based on a display terminal, which comprises the following steps:

shooting face images of people by a main camera and an auxiliary camera which are arranged on the periphery of a display terminal, and fusing the face images of all the people to form a fused image, wherein a plurality of lamp groups are arranged on the periphery of the display terminal at intervals, and each lamp group comprises at least one lamp bead;

the method comprises the steps that a first neural network model is input by repeatedly shooting images of the face of a person to obtain a fusion image, wherein the first neural network model comprises a first input layer, a first middle layer and a first output layer which are sequentially connected, the first neural network model compares the fusion image with a standard image until the total loss of a loss function is less than or equal to a loss threshold value, and then the first output layer outputs a first lamp group control instruction to a lamp group, wherein the standard image corresponds to values of brightness and chromaticity of lamp beads of each lamp group;

acquiring background music, and inputting the background music into a second neural network model, wherein the second neural network model comprises a second input layer, a second intermediate layer and a second output layer which are sequentially connected, the second intermediate layer extracts music beats, the second output layer obtains music style emotion information according to the music beats in a classification manner, and outputs a second lamp group control instruction to each lamp group according to different music style emotion information;

and after the background music stops, switching the second lamp group control instruction into the first lamp group control instruction.

Optionally, the auxiliary cameras include a first auxiliary camera and a second auxiliary camera on two sides of the main camera, the main camera takes RGB images of the face of the person from the front, the first auxiliary camera is a large-aperture black-and-white camera, and the brightness intensity of the image of the face of the person obtained from the black-and-white image taken by the first auxiliary camera is compensated to the RGB image of the face of the person taken by the main camera to obtain an enhanced image;

the second auxiliary camera is a color camera, the distance Z between the shot person and the display terminal is calculated by utilizing the triangulation principle, and background blurring and three-dimensional reconstruction are carried out on the shot person and the enhanced image to obtain a fused image.

Optionally, the first intermediate layer decomposes the fused image into a fused image reflection map and a fused image illumination map based on a Retinex theory, the first output layer compares the fused image reflection map with a standard image reflection map of a standard image, compares the fused image illumination map with the standard image illumination map of the standard image, and obtains a total loss of the loss function by using a weighted summation manner for a loss compared with the standard image reflection map and the standard image illumination map.

Optionally, before inputting the fused image to the first neural network model, geometric distortion correction is performed.

Optionally, the second intermediate layer extracts music beats through a hidden markov model, and the second output layer classifies the music beats by using a support vector machine to obtain music style emotion information.

Optionally, the number of the standard images is multiple, each standard image is provided with a group of values of brightness and chromaticity of each lamp bead, different standard images are switched to the first neural network model in advance, and therefore different first lamp group control instructions are output according to different standard images.

Optionally, the compensating the brightness intensity of the face image of the person obtained from the black and white image captured by the first auxiliary camera to the RGB image of the face of the person captured by the main camera to obtain the enhanced image includes: and obtaining the enhanced image by adopting a brightness channel direct replacement method or a layered fusion method.

Optionally, a second light group control instruction is set corresponding to each music style emotion information.

The invention also provides an intelligent light adjusting device based on the display terminal, which comprises a fusion image generation module, a light source module and a light source module, wherein the fusion image generation module is used for shooting the face images of people through the main camera and the auxiliary camera which are arranged on the periphery of the display terminal and fusing the face images of all the people to form a fusion image;

the first lamp group control instruction generation module is used for obtaining a fused image input first neural network model by repeatedly shooting a person face image, wherein the first neural network model comprises a first input layer, a first middle layer and a first output layer which are sequentially connected, the first neural network model compares the fused image with a standard image until the total loss of a loss function is less than or equal to a loss threshold value, and the first output layer outputs a first lamp group control instruction to a lamp group, wherein the standard image corresponds to the values of the brightness and the chromaticity of lamp beads of each lamp group;

the second lamp group control instruction generation module is used for acquiring background music and inputting the background music into a second neural network model, the second neural network model comprises a second input layer, a second middle layer and a second output layer which are sequentially connected, the second middle layer extracts music beats, the second output layer obtains music style emotion information according to the music beat classification, and outputs a second lamp group control instruction to each lamp group according to the combination of different music style emotion information;

and the instruction switching module is used for switching the second lamp group control instruction into the first lamp group control instruction after the background music stops.

The present invention also provides a computer readable storage medium having a computer program stored thereon, the computer program comprising program instructions which, when executed by a processor, implement the steps of:

The intelligent light adjusting method, the intelligent light adjusting device and the storage medium based on the display terminal are not only suitable for a notebook computer, but also suitable for all displays such as the notebook computer, a desktop computer display, a mobile phone, a large screen and the like, and can be applied to network live broadcast, online teaching, video conferences and the like.

The invention has the following beneficial effects:

(1) the method comprises the steps that a camera and a plurality of lamp beads are arranged on the edge of a display terminal, an image shot by the camera is compared with a standard image by using a neural network model, and an instruction is output to intelligently and adaptively adjust light, so that the skin color and the definition of a human face are adjusted;

(2) the two auxiliary cameras can be effectively matched with the central main camera, so that the three-dimensional effect of the picture is improved, and the effects of more natural blurring and more prominent characters are achieved;

(3) the music style emotion information can be determined according to the beat of the background music, so that the brightness and the chromaticity of the light of the display terminal are intelligently controlled to be suitable for the music style emotion information.

(4) And the background pictures can be manually selected to be overlapped according to the emotional information of the music style or the speaker, so that the atmosphere of the teaching site is supported.

Drawings

The above features and technical advantages of the present invention will become more apparent and readily appreciated from the following description of the embodiments thereof taken in conjunction with the accompanying drawings.

FIG. 1 is a schematic diagram of a display terminal showing an embodiment of the present invention;

FIG. 2 is a flow chart showing a first neural network model controlling a lamp set in accordance with an embodiment of the present invention;

FIG. 3 is a schematic diagram showing image contrast of a first neural network model of an embodiment of the present invention;

FIG. 4 is a flow chart showing a second neural network model controlling a lamp set in accordance with an embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be described below with reference to the accompanying drawings. Those of ordinary skill in the art will recognize that the described embodiments can be modified in various different ways, or combinations thereof, without departing from the spirit and scope of the present invention. Accordingly, the drawings and description are illustrative in nature and not intended to limit the scope of the claims. Furthermore, in the present description, the drawings are not to scale and like reference numerals refer to like parts.

The intelligent light adjusting method based on the display terminal comprises the following steps:

step S2, the main camera 10 arranged at the edge of the periphery of the display terminal and the auxiliary cameras arranged at the two sides of the main camera are used for respectively shooting face images and fusing the face images to form a fused image, wherein a plurality of lamp groups are arranged at the edge of the periphery of the display terminal at intervals, each lamp group can possibly comprise a plurality of lamp beads, the lamp beads of each lamp group can be independently controlled or uniformly controlled, and the lamp beads of the same lamp group can also be uniformly controlled or independently controlled. The lamp group can be arranged on one edge of the periphery or can be arranged on a plurality of edges of the periphery. In addition, the area for shooting the image is not limited, the image can also be an upper half body image, and the camera can be automatically adjusted according to the requirement to achieve the purpose.

The display terminal may be, for example, a notebook computer, an IPAD, a mobile phone, a desktop computer display, a large screen, or the like. Hereinafter, a notebook computer is taken as an example, the main camera 10 may be disposed at a middle position of an upper edge of a screen of the notebook computer, and the first auxiliary camera 20 and the second auxiliary camera 30 are respectively disposed at both sides of the main camera.

The main camera 10 shoots a color high-definition image (RGB image) of the face of a person from the front side, wherein the first auxiliary camera is a large-aperture black-and-white auxiliary camera (Mono), sufficient light inflow can be ensured by utilizing the large aperture of the black-and-white camera, a color image shot by the main camera is taken as a main body, the brightness intensity of the image of the face of the person obtained from the black-and-white image shot by the black-and-white camera is compensated to the color image, the two images are fused to obtain an enhanced image, and the details of the scene of the picture are clearer. The step of fusing the color image and the black-and-white image of the main camera is as follows:

(1) respectively converting the RGB image and the Mono image into Lab images;

(2) replacing the L channel of the Lab image corresponding to the RGB image with the L channel of the Lab image corresponding to the Mono image;

(3) and converting the Lab image corresponding to the RGB image from the Lab space to the RGB color space.

Alternatively, a hierarchical fusion method may be used, such as Li S, Kang X, Hu J. Image fusion with defined filtering [ J ]. IEEE Transactions on Image processing, 2013, 22(7): 2864-.

The second auxiliary camera is a color camera (RGB image), is matched with the main camera, and calculates the distance Z between a shot person and the screen by utilizing a triangulation positioning principle, thereby realizing background blurring and stereoscopic reconstruction.

Wherein T is the distance between the second auxiliary camera and the main camera;

f is the distance from the photosensitive sensor to the focal plane;

the intersection point of the connecting line of the shot person and the main camera and the focal plane;

is the intersection point of the connecting line of the shot person and the second camera and the focal plane.

Step S4, inputting the fused image into a first neural network model, where the first neural network model includes a first input layer, a first intermediate layer, and a first output layer, which are connected in sequence. The first input layer is used for receiving the fused image, the first intermediate layer comprises five convolution layers and an activation function layer which are sequentially connected, and the activation function layer can adopt a sigmoid function. As shown in fig. 3, the first intermediate layer decomposes the fused image into a fused image reflection map and a fused image illumination map based on Retinex theory (retina cortex theory), wherein the reflection map expresses the intrinsic reflection characteristics of the object, including the material and color of the object, independent of the illumination environment, and the illumination map includes the brightness information after the illumination environment and the geometric structure of the object. And the first output layer compares the fused image reflection map and the fused image illumination map with the standard image reflection map and the standard image illumination map of the standard image, and obtains the total loss of the loss function by adopting a weighted summation mode for the loss compared with the standard image reflection map and the standard image illumination map. And comparing the total loss of the loss function with a loss threshold, and if the total loss of the loss function is less than or equal to the loss threshold, indicating that the illumination condition of the fused image is superior to or equal to that of the standard image. If the loss function total loss is larger than the loss threshold, the brightness and the chromaticity of each lamp group are adjusted, the face image of the person is shot again after the light adjustment, the face image is input into the first neural network model, the lamp group adjustment is completed until the loss function total loss is smaller than or equal to the loss threshold, and video output is carried out under the light condition.

Wherein, the standard image corresponds to a group of values of brightness and chromaticity of each lamp bead. Of course, a plurality of standard images can be set, and each standard image is provided with a group of values of brightness and chromaticity of each lamp bead. The illumination intensity is compared with the illumination intensity, and the difference between the reflection characteristics of the standard image and the reflection characteristics of the illumination pattern is reflected by comparing the reflection patterns, wherein the larger the difference between the reflection characteristics is, the larger the reflection pattern loss is, and the smaller the difference between the reflection characteristics is, the smaller the reflection pattern loss is. The smaller the loss of the reflection map is, the closer the reflection characteristics of the reflection map of the fused image and the emission map of the standard image are, the closer reflection condition to the reflection light of the human environment in the standard image can be embodied, and the light effect corresponding to the standard image can be obtained more accurately by combining the loss of the reflection map and the loss of the illumination map.

The factors determining the reflection characteristics of the object are mainly the surface condition of the object, such as material and color. Specifically, the method for identifying the material of the object in the image can use a method in 'material identification and segmentation method research based on convolutional neural network and ensemble learning, Beijing university of transportation, Li Wan'. The identification of the color can be identified by the contrast of three channels of the RGB image. For example, assuming that the clothes material is the same, the clothes color of the person in the standard image is yellow, and the clothes color of the person in the fused image is white, the photographing effect is different under the same light due to the higher reflectance of white, and the light command in the case that the reflectance of the color in the standard image is closer to the reflectance of the color in the standard image can be selected as much as possible by taking the difference of the reflectance into consideration.

Setting the total loss of the loss function λ = λ 1 × reflection map loss + λ 2 × reflection map loss

Wherein, the reflection map loss refers to the absolute value of loss after the reflection map of the standard image is compared with the reflection map of the fusion image;

the illumination map loss refers to the absolute value of loss after the standard image illumination map is compared with the fused image illumination map;

λ 1 and λ 2 are the ratio weights of the reflection map loss and the illumination map loss, respectively. The illumination pattern loss can be taken as a main part and the reflection pattern loss as an auxiliary part, and the weight of the illumination pattern loss is set to be more than 0.5.

As shown in fig. 2, after the standard image is selected, under the condition of the current brightness of the LED lamp beads, the main camera, the first auxiliary camera, and the second auxiliary camera take images, form a fused image, and input the fused image to the first neural network model, and the first neural network model outputs a first lamp group control instruction to each lamp bead according to parameters such as a chromaticity range, a brightness range, and a lamp bead coordinate of each lamp bead in the lamp group. And after the brightness and the chromaticity of each lamp bead are changed according to the instruction, shooting again, forming a fused image, inputting the fused image to the first neural network model, repeating the steps until the total loss of the loss function is less than or equal to the loss threshold value, and finishing the regulation of the lamp group.

Further, before the fused image is input to the first neural network model, geometric distortion correction is performed. The geometric distortion is generated due to the displacement of pixel points in the image. The positions of the pixels are remapped through the central main camera and the second auxiliary camera, and geometric distortion correction can be achieved by using an undistort () method of openCV. Let uov be the coordinate system of the normal image,

o

is a distorted coordinate system. k1, k2 and k3 are radial distortion coefficients, and p1 and p2 are tangential distortion coefficients. The principle of the mathematical algorithm is as follows:

radial distortion mathematical model

Wherein

And (x, y) are coordinate points of the normal image.

Tangential distortion mathematical model

。

Step S6, as shown in fig. 4, obtaining background Music, inputting the background Music into a second neural network model, where the second neural network model includes a second input layer, a second middle layer, and a second output layer, the second middle layer extracts Music beats (Ryyn ä nen M P, Klapuri a P. Automatic transcription of mass, bass line, and chlorine in polymeric Music [ J ]. Computer Music Journal, 2008, 32(3): 72-86.) through a Hidden Markov Model (HMM), and the second output layer classifies Music style emotion information according to the Music beats. And the second output layer outputs a second lamp group control instruction to each lamp group according to the emotional information of different music styles and outputs videos under the lighting condition. Several second lamp group control instructions can be set corresponding to different music style emotional information, so that the light of the lamp group can be matched with the music atmosphere created by the combination. For example, songs in red style or in hot ocean are provided with yellow, red and other warm light; gentle and quiet songs use a cool tone, such as a light blue. The light can flash or change brightness with the beat.

The identification of the music style emotion information can be realized by adopting a classification network, such as a support vector machine and other methods on the basis of extracting tone color characteristics, melody and harmony characteristics and rhythm characteristics. The music style refers to a certain remarkable or unique sound generated by combining various music elements, wherein the music elements comprise tunes, rhythms, timbres, dynamics, harmony, tissues, melody forms and the like, and the music style comprises folk songs, dramas, jazz music and the like.

The musical emotion refers to music which can make a person show certain emotion from auditory effects, such as injury, joy, silence, and sweet. The second lamp group control command includes, for example, a lamp brightness change frequency, a lamp chromaticity change frequency, a lamp brightness change amplitude, a lamp chromaticity change amplitude, and the like. The second lamp group control instruction is different from the first lamp group control instruction, and the first lamp group control instruction is mainly used for making the picture of the display terminal clear by controlling the brightness and the chromaticity of the lamp beads. And the second lamp group control instruction is used for forming a matched light effect with different music style emotional information.

In step S8, after the background music is stopped, the second light group control command is switched to the first light group control command, and the video output is maintained.

Further, according to the identification of the music style emotion information, a background image library is searched, a background image matched with the music emotion is searched from the background image library, and the background image is used as a background and is superposed on the lower layer of the character image. The background image matched with the music emotion can be preset in a background image library, for example, sad music, the background image can be the departure of a station, for example, happy music, celebration image and the like. For example, a Mongolian song, a Liaokuai grassland, and the like. Furthermore, after the second neural network model identifies the music style emotion information, images corresponding to the music style emotion information are searched in the emotion picture library according to the music emotion. Pictures expressing various emotions are stored in the emotion picture library in advance. And (3) calculating the similarity between the emotion information expressed by each picture and the music emotion, such as cosine similarity, and judging that the picture conforms to the music emotion if the similarity reaches or exceeds a similarity threshold. There may be multiple pictures corresponding to the emotion information, one of the pictures may be randomly selected as a background, or may be sequentially arranged according to cosine similarity, and the picture with the highest similarity is selected as the background.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An intelligent light adjusting method based on a display terminal is characterized by comprising the following steps:

the method comprises the steps of shooting face images of people through a main camera and an auxiliary camera which are arranged on the periphery of a display terminal, fusing the face images of all the people to form a fused image, wherein a plurality of lamp groups are arranged on the periphery of the display terminal at intervals, each lamp group comprises at least one lamp bead,

the auxiliary cameras comprise a first auxiliary camera and a second auxiliary camera which are arranged on two sides of the main camera, the main camera shoots RGB images of the face of a person from the front side, the first auxiliary camera is a large-aperture black-and-white camera, and the brightness intensity of the face image of the person obtained from the black-and-white image shot by the first auxiliary camera is compensated to the RGB image shot by the main camera to obtain an enhanced image;

the second auxiliary camera is a color camera, the distance Z between a shot person and the display terminal is calculated by utilizing the principle of triangulation positioning, and background blurring and three-dimensional reconstruction are carried out on the shot person and the enhanced image to obtain a fused image;

the method comprises the steps of repeatedly shooting a face image of a person to obtain a fused image and inputting the fused image into a first neural network model, wherein the first neural network model comprises a first input layer, a first middle layer and a first output layer which are sequentially connected, the first neural network model compares the fused image with a standard image, the first neural network model outputs a first lamp group control instruction to each lamp bead according to the chromaticity range, the luminance range and the lamp bead coordinate parameters of each lamp bead in a lamp group, each lamp bead shoots again after changing luminance and chromaticity according to the instruction to form the fused image and inputs the fused image into the first neural network model, the operation is repeated until the total loss of a loss function is smaller than or equal to a loss threshold value, the first output layer outputs the first lamp group control instruction to the lamp group, wherein the standard image has values of luminance and chromaticity of the lamp beads of each lamp group,

the first intermediate layer decomposes a fused image into a fused image reflection map and a fused image illumination map based on Retinex theory, the first output layer compares the fused image reflection map with a standard image reflection map of a standard image, compares the fused image illumination map with the standard image illumination map of the standard image, and obtains the total loss of a loss function by adopting a weighted summation mode for the loss compared with the standard image reflection map and the standard image illumination map;

acquiring background music, and inputting the background music into a second neural network model, wherein the second neural network model comprises a second input layer, a second intermediate layer and a second output layer which are sequentially connected, the second intermediate layer extracts music beats, the second output layer obtains music style emotion information according to the music beats in a classification manner, and outputs a second lamp group control instruction to each lamp group according to different music style emotion information, and the second lamp group control instruction comprises a lamp brightness change frequency, a lamp chromaticity change frequency, a lamp brightness change amplitude and a lamp chromaticity change amplitude;

2. The intelligent light adjusting method based on the display terminal according to claim 1,

before inputting the fused image into the first neural network model, geometric distortion correction is performed.

3. The intelligent lighting adjustment method based on the display terminal as claimed in claim 1, wherein the second middle layer extracts music beats through a hidden markov model, and the second output layer classifies the music beats by using a support vector machine to obtain music style emotion information.

4. The intelligent light adjusting method based on the display terminal according to claim 1,

the standard images are multiple, each standard image is provided with a group of values of brightness and chromaticity of each lamp bead, different standard images are switched to the first neural network model in advance, and therefore different first lamp group control instructions are output according to different standard images.

5. The intelligent lighting adjustment method based on the display terminal of claim 1, wherein the compensating the brightness intensity of the face image of the person obtained from the black and white image captured by the first auxiliary camera to the RGB image of the face of the person captured by the main camera to obtain the enhanced image comprises: and obtaining the enhanced image by adopting a brightness channel direct replacement method or a layered fusion method.

6. The intelligent light adjusting method based on the display terminal according to claim 1,

and a second lamp group control instruction is set corresponding to each music style emotion information.

7. An intelligent light adjusting device based on a display terminal is characterized in that,

the system comprises a fused image generation module, a display terminal and a control module, wherein the fused image generation module is used for shooting face images of people through a main camera and an auxiliary camera which are arranged on the periphery of the display terminal and fusing the face images of all people to form a fused image, a plurality of lamp groups are arranged at intervals on the periphery of the display terminal, each lamp group comprises at least one lamp bead, the auxiliary camera comprises a first auxiliary camera and a second auxiliary camera which are arranged on two sides of the main camera, the main camera shoots RGB images of the faces of the people from the front side, the first auxiliary camera is a large-aperture black-and-white camera, the brightness intensity of the face images of the people obtained from the black-and-white images shot by the first auxiliary camera is compensated to the RGB face images shot by the main camera, and an enhanced image is obtained;

the first lamp group control instruction generation module is used for obtaining a fusion image by repeatedly shooting a person face image and inputting the fusion image into a first neural network model, the first neural network model comprises a first input layer, a first middle layer and a first output layer which are connected in sequence, the first neural network model compares the fusion image with a standard image, the first neural network model outputs a first lamp group control instruction to each lamp bead according to the chromaticity range, the luminance range and the lamp bead coordinate parameters of each lamp bead in the lamp group, each lamp bead generates luminance and chromaticity changes according to the instruction, shooting is carried out again to form a fusion image and inputs the fusion image to the first neural network model, the operation is repeated until the total loss of a loss function is less than or equal to a loss threshold value, the first output layer outputs the first lamp group control instruction to the lamp group, wherein the standard image corresponds to the luminance and chromaticity values of the lamp beads of each lamp group,

the second lamp group control instruction generation module is used for acquiring background music and inputting the background music into a second neural network model, the second neural network model comprises a second input layer, a second middle layer and a second output layer which are sequentially connected, the second middle layer extracts music beats, the second output layer obtains music style emotion information according to the music beat classification and outputs a second lamp group control instruction to each lamp group according to the combination of different music style emotion information, and the second lamp group control instruction comprises a lamp brightness change frequency, a lamp chromaticity change frequency, a lamp brightness change amplitude and a lamp chromaticity change amplitude;

8. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, the computer program comprising program instructions, which when executed by a processor, implement the intelligent light adjusting method based on a display terminal according to any one of claims 1 to 6.