CN116485794A - Face image analysis method for virtual vocal music teaching - Google Patents
Face image analysis method for virtual vocal music teaching Download PDFInfo
- Publication number
- CN116485794A CN116485794A CN202310720312.9A CN202310720312A CN116485794A CN 116485794 A CN116485794 A CN 116485794A CN 202310720312 A CN202310720312 A CN 202310720312A CN 116485794 A CN116485794 A CN 116485794A
- Authority
- CN
- China
- Prior art keywords
- pixel point
- value
- pixel
- motion
- point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000001755 vocal effect Effects 0.000 title claims abstract description 38
- 238000003703 image analysis method Methods 0.000 title claims abstract description 21
- 230000033001 locomotion Effects 0.000 claims abstract description 178
- 238000000034 method Methods 0.000 claims abstract description 28
- 230000001815 facial effect Effects 0.000 claims abstract description 25
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000010191 image analysis Methods 0.000 abstract description 4
- 230000002787 reinforcement Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 26
- 230000008569 process Effects 0.000 description 9
- 238000000605 extraction Methods 0.000 description 6
- 230000009471 action Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000007547 defect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003245 coal Substances 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/269—Analysis of motion using gradient-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a facial image analysis method for virtual vocal music teaching. The invention is applied to the field of image processing, and the method comprises the following steps: collecting real-time images of a user exercising by using the virtual vocal music teaching auxiliary system; then, determining a key area of the real-time image according to the motion speed and the motion direction of the pixel points in the real-time image; then, according to the number of pixel points in each line segment in the key area and the movement speed, determining the characteristic points in each line segment; and then weighting the real-time image according to the characteristic points and the result of the Laplacian processing to obtain the enhanced real-time image. The result of the Laplacian processing of the real-time image is weighted according to the feature points, so that the image reinforcement of the key areas in the real-time image is realized, and the accuracy of the facial image analysis of the virtual vocal music teaching is improved.
Description
Technical Field
The invention relates to the field of image processing, in particular to a facial image analysis method for virtual vocal music teaching.
Background
The virtual vocal music teaching auxiliary system is a teaching auxiliary tool developed based on artificial intelligence technology, and can help students learn singing skills by simulating human voice and motion. The system typically uses computer graphics and simulation techniques to create a virtual reality environment in which students can practice singing. The system can provide real-time feedback, analyze the data of the voice, the throat movement and the like of the students, give corresponding suggestions and help the students to continuously promote own singing skills.
When the virtual vocal music teaching auxiliary system is used, the system usually collects real-time images of student exercises, and obtains which places of students need to be improved and which places are excellent in performance by analyzing action expressions of faces, throats and the like of the students. In the process, the facial feature extraction precision of the students directly influences the judgment of the learning effect of the students and the precision of the proposal proposed by the system. Therefore, the collected images often need to be subjected to image enhancement processing so as to improve the extraction accuracy of facial features in the training process of students.
When the Laplacian operator is used for enhancing the image, the original image and the Laplacian image are required to be weighted and summed, and the enhancement of the image is completed through image superposition. The corresponding weight factors need to be acquired, and as the Laplacian values of some pixel points in the Laplacian image may be negative, phenomena such as contrast reduction and noise enhancement easily occur during superposition, so that the accuracy of facial image analysis for virtual vocal teaching is not high.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a facial image analysis method for virtual vocal music teaching.
The invention is realized by the following technical scheme:
the invention provides a facial image analysis method for virtual vocal music teaching, which comprises the following steps:
collecting real-time images of a user exercising by using the virtual vocal music teaching auxiliary system;
determining a key area of the real-time image according to the motion speed and the motion direction of the pixel points in the real-time image;
determining characteristic points in each line segment according to the number of pixel points in each line segment in the key region and the movement speed;
and weighting the result of the Laplacian processing of the real-time image according to the characteristic points to obtain an enhanced real-time image.
Further, the determining the key area of the real-time image according to the motion speed and the motion direction of the pixel point in the real-time image includes:
determining a ratio between a difference value of motion speeds corresponding to two pixel points in the motion direction and a motion speed corresponding to a first pixel point according to the motion speed and the motion direction respectively corresponding to the two pixel points in the real-time image, and obtaining a first motion speed difference, wherein the two pixel points comprise a first pixel point and a second pixel point;
and when the first motion speed difference is smaller than the preset motion speed difference, executing a second motion speed difference determined according to a third pixel point and the first pixel point, and so on until the nth motion speed difference is larger than or equal to the preset motion speed difference, confirming that the first pixel point, the second pixel point and the nth pixel point belong to a motion area in a face area in the real-time image, wherein N is an integer larger than 2, and the nth motion speed difference comprises determining a ratio between a difference value of the motion speeds corresponding to the nth pixel point and the first pixel point and the motion speed corresponding to the first pixel point in the motion direction according to the motion speed and the motion direction respectively corresponding to the nth pixel point and the first pixel point.
Further, after confirming that the first pixel point, the second pixel point, and the nth pixel point belong to the motion area in the face area in the real-time image, the method further includes:
carrying out average calculation on gray values corresponding to all pixel points in the motion area, and determining average gray values;
and clustering the motion areas by a clustering method according to the average gray value to obtain a first motion area.
Further, clustering the motion areas by a clustering method according to the average gray value, and after obtaining the first motion area, further includes:
subtracting the gray values corresponding to the ith pixel point and the (i+1) th pixel point in the first motion area to obtain a first gray difference value, wherein i is an integer greater than 0;
and if the gray level difference value is smaller than the preset gray level difference value, executing a second gray level difference value determined according to the (i+2) th pixel point and the (i+N) th pixel point, and so on until the (i+N) th motion speed difference is larger than or equal to the preset motion speed difference, confirming that the (i+1) th pixel point to the (i+N) th pixel point belong to a key area in a face area in the real-time image, wherein N is an integer larger than 2, and the (i+N) th motion speed difference comprises a difference value obtained by subtracting gray level values corresponding to the (i+N) th pixel point and the (i+N) th pixel point in the first motion area.
Further, the determining the feature point in each line segment according to the number of the pixel points in each line segment in the key region and the motion speed includes:
dividing a line segment into a first part and a second part according to a pixel point Z on the line segment in the key region;
determining a first function value A by using the gray value of each pixel point on the first part, the gray value of each pixel point on the second part, each movement speed of each pixel point on the first part and each movement speed of each pixel point on the second part;
determining a second function value B by using the number of first pixels, the number of second pixels and a preset positive number, wherein the number of the first pixels comprises the number of each pixel on the first part, and the number of the second pixels comprises the number of each pixel on the second part;
determining the product of the first function value A and the second function value B as a feature point optimal value YX;
and determining the maximum feature point optimal selection value YX as the feature point in the line segment according to the feature point optimal selection value YX corresponding to each pixel point Z on the line segment.
Further, the determining a first function value a according to the gray value of each pixel point on the first portion, the gray value of each pixel point on the second portion, each movement speed of each pixel point on the first portion, and each movement speed of each pixel point on the second portion includes:
subtracting the gray value of each pixel point on the first part from the gray value of each pixel point on the second part to obtain a first gray difference value;
subtracting the motion speeds of the pixel points on the first part from the motion speeds of the pixel points on the second part to obtain a first motion speed difference;
and processing the first gray level difference value and the first movement speed difference value through an exponential function to determine a first function value A.
Further, the processing the first gray difference value and the first motion speed difference value through an exponential function to determine a first function value a includes:
according to the first function value
A first function value a is determined, wherein,is the gray value of the ith pixel point in the first part of the line segment,/>Is the gray value of the ith pixel point in the second part,/for the second part>For the movement speed of the ith pixel point in the first part,/th>For the motion speed of the ith pixel point corresponding to the pixel point in the second part, +.>The number of the pixels of the first part after dividing the horizontal line segment into two parts for the pixel z point,the number of pixels in the second portion.
Further, the determining the second function value B by using the number of the first pixel points, the number of the second pixel points and the preset positive number includes:
according to the second function valueDetermining a second function value B, wherein +.>Dividing a horizontal line segment into two parts for the pixel point z point, wherein the number of the pixel points of the first part is +.>For the number of pixels of the second part, < >>Is a preset positive number.
Further, the weighting processing is performed on the result of the laplace Laplacian processing on the real-time image according to the feature point, so as to obtain an enhanced real-time image, which includes:
according to the q-th pixel pointThe gray value of the ith pixel point in the neighborhood and the gray value of the q point of the pixel point are used for determining the probability P that the q point is a noise point, and the q-th pixel point is the characteristic point;
according to the information richness Y, laplacian value L, probability P and/or maximum value corresponding to the q-th pixel pointDetermining a weight factor C, wherein the probability P comprises the probability that the q-th pixel point is noise, and the maximum value +.>Comprises the maximum value of each pixel value in the real-time image after being processed by LaplacianThe information richness Y corresponding to the q-th pixel point comprises information richness Y which is determined according to the motion speed of the q-th pixel point, the distance between the q-th pixel point and the characteristic point and the number of the pixel points of the first motion area where the q-th pixel point is positioned;
determining a pixel point gray value after the pixel point q is enhanced according to the weight factor C and the Laplacian value corresponding to the q-th pixel point。
Further, the enhanced gray value is determined according to the weight factor and the Laplacian value corresponding to each pixel point in the real-time imageComprising:
by passing throughDetermining pixel gray value +_after pixel q enhancement>Wherein->The pixel value of the pixel point q is given, c is a weight factor, < ->The Laplacian value is the pixel point q.
Compared with the prior art, the invention has the following beneficial technical effects:
the invention provides a facial image analysis method for virtual vocal music teaching, which comprises the steps of obtaining a gray image of the surface of a cable clamp plate; performing image segmentation processing on the gray level image to determine all gate areas forming the cable clamp plate; and determining the defect of each gate area according to the gray level change from the gate point of each gate area to the line segment on the edge of the gate area and the gray level difference of different line segments in the gate area. Therefore, the defects of the cable clamp plate for the coal mining machine can be accurately detected, and the defect detection effect is improved.
Drawings
Fig. 1 is a flow chart of a facial image analysis method for virtual vocal teaching according to an embodiment of the present invention.
Detailed Description
The invention will now be described in further detail with reference to specific examples, which are intended to illustrate, but not to limit, the invention.
The facial image analysis method for virtual vocal teaching provided by the embodiment of the invention is applicable to the practice of using the virtual vocal teaching auxiliary system by a user, can be realized by the facial image analysis device for virtual vocal teaching, and can be arranged on the virtual vocal teaching auxiliary equipment or can be an independently arranged device without limitation.
FIG. 1 is a flow chart of a facial image analysis method for virtual vocal teaching according to an embodiment of the present invention; as shown in fig. 1, a facial image analysis method for virtual vocal music teaching provided by an embodiment of the present invention includes:
step 101, collecting real-time images of a user practice by using a virtual vocal music teaching auxiliary system;
in the embodiment of the invention, the real-time image can be used for acquiring the exercise image of the student in the exercise process in real time through the camera or other sensors arranged in the virtual vocal music teaching auxiliary system, and the acquired real-time image is an RGB image. The invention uses a weighted graying method to carry out graying treatment on RGB images to obtain the gray images for student practice. Wherein the weighted gray scale is a well known technique and will not be described in detail herein.
The specific scene aimed by the invention is as follows: when students practice singing by using the virtual vocal music teaching auxiliary system, the system usually collects real-time images of student practice, and the images can help students to better know own singing postures and throat movements, enhance the real-time images of practice, and facilitate the system to provide corresponding suggestions for the defects in the student practice process.
102, determining a key area of the real-time image according to the motion speed and the motion direction of the pixel points in the real-time image;
specifically, according to the motion speed and the motion direction respectively corresponding to two pixel points in the real-time image, determining the ratio between the difference value of the motion speeds corresponding to the two pixel points in the motion direction and the motion speed corresponding to the first pixel point to obtain a first motion speed difference, wherein the two pixel points comprise a first pixel point and a second pixel point;
and when the first motion speed difference is smaller than the preset motion speed difference, executing a second motion speed difference determined according to a third pixel point and the first pixel point, and so on until the nth motion speed difference is larger than or equal to the preset motion speed difference, confirming that the first pixel point, the second pixel point and the nth pixel point belong to a motion area in a face area in the real-time image, wherein N is an integer larger than 2, and the nth motion speed difference comprises determining a ratio between a difference value of the motion speeds corresponding to the nth pixel point and the first pixel point and the motion speed corresponding to the first pixel point in the motion direction according to the motion speed and the motion direction respectively corresponding to the nth pixel point and the first pixel point.
In this embodiment, when a student exercises singing by using the virtual vocal teaching auxiliary system, the student often exercises a song, so that the acquired exercise images are continuous video images and are image sequences. In the method, an optical flow method is first used to obtain the motion speed and the motion direction of a pixel point in the image Q, and the optical flow method is a known technology and is not repeated here. The logic is as follows: students often follow the actions of laryngeal pronunciation and facial expression when singing, and the actions are reflected as the consistent motion characteristics of local pixel points in images, and whether the actions are standard directly influences the exercise efficiency and accuracy of the students in the exercise process. Therefore, the invention completes the preliminary extraction of the ROI area based on the motion speed and the motion direction of the pixel points.
The motion speed of the q point (here, the q point is taken as an example) of the pixel point can be obtained according to the steps(/>) Direction of movementThen the pixel point q point is the initial growth point of the region growth, and the growth condition is the difference of the movement speed +.>If the threshold value is smaller than 0.05, the growth is carried out, otherwise, the growth is not carried out, the threshold value is set according to an empirical value, and an implementer can adjust the threshold value. Wherein->For the movement speed of the growth point q point, +.>The speed of movement of the point a to be grown (here, point a is taken as an example). Then the motion pixels in the image (i.e., pixels with motion speeds greater than 0) can be grown to obtain multiple motion regions.
Further, on the basis of the foregoing embodiment, after confirming that the first pixel point, the second pixel point, and the nth pixel point belong to the motion area in the face area in the real-time image, the method further includes:
carrying out average calculation on gray values corresponding to all pixel points in the motion area, and determining average gray values;
and clustering the motion areas by a clustering method according to the average gray value to obtain a first motion area.
For example, since the background of the student during exercise cannot be controlled, there may be an interference area in the motion area, the average gray value of the motion area is calculated, the average gray value of the motion area is clustered by using a K-means clustering algorithm, k=2 is set, the distance measure is the average gray value difference corresponding to the motion area, and the clustering process is a known technology and will not be described herein. Two types of motion areas can be obtained, which are respectively marked as a first type of motion area and a second type of motion area. Assuming that the average gray value variance of the motion regions in the first type of motion regions is smaller, the first type of motion regions are the motion regions in the face region.
Preferably, on the basis of the foregoing embodiment, the clustering, according to the average gray value, the moving areas by a clustering method, to obtain a first moving area, further includes:
subtracting the gray values corresponding to the ith pixel point and the (i+1) th pixel point in the first motion area to obtain a first gray difference value, wherein i is an integer greater than 0;
and if the gray level difference value is smaller than the preset gray level difference value, executing a second gray level difference value determined according to the (i+2) th pixel point and the (i+N) th pixel point, and so on until the (i+N) th motion speed difference is larger than or equal to the preset motion speed difference, confirming that the (i+1) th pixel point to the (i+N) th pixel point belong to a key area in a face area in the real-time image, wherein N is an integer larger than 2, and the (i+N) th motion speed difference comprises a difference value obtained by subtracting gray level values corresponding to the (i+N) th pixel point and the (i+N) th pixel point in the first motion area.
For example, a pixel point in the first type of motion region can be selected as an initial growth point for region growth under the condition of gray scale differenceWhen gray scale difference->And when the gray level growth threshold value is smaller than 10, growing the point a to be grown, otherwise, not growing, repeating the growth process, and completing extraction of the key region, namely the ROI region. Can finishSegmentation of face regions (ROI regions). The face region includes both a facial region and a laryngeal region.
Step 103, determining characteristic points in each line segment according to the number of pixel points and the movement speed in each line segment in the key region;
in this embodiment, the face region is further analyzed to obtain the information richness corresponding to the pixel points therein, and the Laplacian value obtained by the Laplacian operator is optimized based on the information richness, so that the self-adaptive enhancement of the face region is completed, the image contrast is improved, the noise is reduced, and the feature point extraction precision of the subsequent face region is improved.
According to the above steps, the face region is obtained, the ROI region is considered to be composed of N horizontal line segments, the N horizontal line segments are analyzed respectively, and by taking any line segment R as an example, the feature points of the line segment R are obtained, and the feature point optimal value YX is calculated for each pixel point z (here, z point is assumed) on the line segment R.
Preferably, the line segment is divided into a first part and a second part according to the pixel point Z on the line segment in the key region;
determining a first function value A by using the gray value of each pixel point on the first part, the gray value of each pixel point on the second part, each movement speed of each pixel point on the first part and each movement speed of each pixel point on the second part;
determining a second function value B by using the number of first pixels, the number of second pixels and a preset positive number, wherein the number of the first pixels comprises the number of each pixel on the first part, and the number of the second pixels comprises the number of each pixel on the second part;
determining the product of the first function value A and the second function value B as a feature point optimal value YX;
and determining the maximum feature point optimal selection value YX as the feature point in the line segment according to the feature point optimal selection value YX corresponding to each pixel point Z on the line segment.
Specifically, the determining a first function value a according to the gray value of each pixel point on the first portion, the gray value of each pixel point on the second portion, each motion speed of each pixel point on the first portion, and each motion speed of each pixel point on the second portion includes:
subtracting the gray value of each pixel point on the first part from the gray value of each pixel point on the second part to obtain a first gray difference value;
subtracting the motion speeds of the pixel points on the first part from the motion speeds of the pixel points on the second part to obtain a first motion speed difference;
and processing the first gray level difference value and the first movement speed difference value through an exponential function to determine a first function value A.
For example, according to the first function value
Determining a first function value A, wherein +.>Is the gray value of the ith pixel point in the first part of the line segment,/>Is the gray value of the ith pixel point in the second part,/for the second part>For the movement speed of the ith pixel point in the first part,/th>For the motion speed of the ith pixel point corresponding to the pixel point in the second part, +.>Dividing a horizontal line segment into two parts for the pixel point z point, wherein the number of the pixel points of the first part is +.>The number of pixels in the second portion.
Preferably, in the foregoing embodiment, determining the second function value B includes:
according to the second function valueDetermining a second function value B, wherein +.>Dividing a horizontal line segment into two parts for the pixel point z point, wherein the number of the pixel points of the first part is +.>For the number of pixels of the second part, < >>Is a preset positive number.
In this example, B reflects the division of the z point. The larger the value, the more likely the pixel point is to be a point of symmetry. Min () is the minimum value selected.The middle is the gray value of the ith pixel point in the first part of the line segment,/th pixel point>The middle is the corresponding pixel point in the second part of the line segment (the pixel point with the largest correspondence is the corresponding pixel point, ">Here assume +.>The corresponding degree DY is calculated for the pixel point a in the first part and the pixel point a in the second part respectively, and the pixel point with the maximum corresponding degree DY is the corresponding pixel point)>For the movement speed of the ith pixel point in the first part,/th>The motion speed of the ith pixel point corresponding to the pixel point in the second part is obtained. The smaller the difference, the larger the preference value.
On the basis of the above-described embodiments,the larger YX indicates that the pixel point is the feature point corresponding to the line segment, and the pixel point corresponding to the maximum YX is selected as the feature point of the line segment. According to the steps, the N horizontal line segments are analyzed, and the position information of the pixel points can be reflected through the distances between the pixel points and the characteristic points of the corresponding line segments.
And 104, weighting the real-time image according to the characteristic points and the result of the Laplacian processing of the real-time image to obtain an enhanced real-time image.
In the present embodiment, according to the q-th pixel pointThe gray value of the ith pixel point in the neighborhood and the gray value of the q point of the pixel point are used for determining the probability P that the q point is a noise point, and the q-th pixel point is the characteristic point;
according to the information richness Y, laplacian value L, probability P and/or maximum value corresponding to the q-th pixel pointDetermining a weight factor C, wherein the probability P comprises the probability that the q-th pixel point is noise, and the maximum value +.>The method comprises the steps that the maximum Laplacian value is obtained after Laplacian processing is carried out on each pixel value in the real-time image, and the information richness Y corresponding to the q-th pixel point is determined according to the motion speed of the q-th pixel point, the distance between the q-th pixel point and the characteristic point and the number of the pixel points in a first motion area where the q-th pixel point is located;
according to the weight factor C and the corresponding q-th pixel pointLaplacian value of (2) and determining the gray value of the pixel point after the pixel point q is enhanced。
In the present embodiment, the information richness Y corresponding to the pixel point q point (here, assumed to be q point) is acquired based on the position information and the motion information of the pixel point in the ROI area:
in the formulaThe smaller the motion speed is, the smaller the motion amplitude is, the smaller the contrast in the image is, and the less error at the position is easily found when teaching and coaching are carried out, so that the greater the information richness is, the greater the contrast at the position is increased when the follow-up self-adaption is enhanced, and whether the motion at the position is standard or not is extracted more accurately. />The distance between the q point of the pixel point and the corresponding line segment characteristic point is represented, and the smaller the distance is, the larger the information richness is. The logic is as follows: according to prior knowledge, when facial actions such as smiling and the like of a person are performed, muscles on two sides are often driven to move from a central position, so that specific expression actions are formed on the surface. Therefore, the closer the distance from the feature point is, the more important the information is, and the greater the information richness is. And m is the number of the pixel points in the motion area where the q points are located (the motion area obtained by the area growth method), and the larger the value of the m is, the more information the pixel points contain is indicated. And (3) injection: if->Equal to 0, y=0.
The larger the information richness Y is, the larger the influence of the information contained in the pixel points on teaching of students is, and the stronger the corresponding enhancement effect is.
Furthermore, when the Laplacian operator is adopted to enhance the image, each pixel point on the image can obtain a Laplacian value corresponding to the pixel point, and then a Laplacian image is obtained. And superposing the Laplacian image and the original image to obtain an enhanced image.
Where the superposition is often performed using a weighted summation method.
Wherein the convolution kernel in Laplacian operator isAnd placing the center of the convolution kernel on a certain pixel point of the image, multiplying each coefficient in the convolution kernel by the values of 8 surrounding pixel points, and adding the product results to obtain the Laplacian value of the center pixel point. Thus, the Laplacian processing result of the whole image can be obtained by continuously moving, wherein when pixels at the edge of the image are processed, some pixels may be absent from the periphery, and then an interpolation algorithm can be used to perform boundary filling on the absent pixels or only pixels existing in the periphery are used to calculate Laplacian values.
The Laplacian value corresponding to the pixel q can be obtained. The purpose of obtaining the Laplacian value is to perform weighted summation on the Laplacian value passing through the pixel point and the gray value corresponding to the original image pixel point, so as to complete enhancement optimization of the original image, and the Laplacian value of the pixel point may have a negative value, when the overlapping enhancement is performed, adverse effects may be generated, for example, the contrast is reduced, a region with higher gray level in the original image is darkened, a region with lower gray level is brightened, and when some noise points or interference signals exist in the Laplacian image, the noise points after the overlapping can be strengthened due to the fact that the Laplacian value is negative. The Laplacian value is abbreviated herein as the L value. For each pixel point on the original image, the corresponding L value is obtained, and the weight factor of the L value is obtained based on the local feature of the pixel point in the image:
in the formulaFor +.q>Gray value of i-th pixel in the neighborhood, < ->As the gray value of the q point of the pixel point, the P point reflects the probability that the q point is a noise point, and the larger the value is, the smaller the corresponding Laplacian value weight factor is. Y is the information richness corresponding to the pixel points. L is Laplacian value corresponding to pixel point,>then it is the maximum of the Laplacian values. And 0.8 is a threshold value set according to an empirical value.
Therefore, the invention carries out self-adaptive acquisition on the weight factors of the Laplacian values corresponding to the pixel points in the Laplacian image, enhances the original image based on the weight factors and the Laplacian values, so that the Laplacian values of the noise points are smaller, the possibility of generating noise point spots after superposition is reduced, and the contrast of the image after superposition is improved.
In the above embodiment, the enhanced gray value is determined according to the weight factor and the Laplacian value corresponding to each pixel point in the real-time imageComprising:
by passing throughDetermining pixel gray value +_after pixel q enhancement>Wherein->The pixel value of the pixel point q is given, c is a weight factor, < ->The Laplacian value is the pixel point q.
For example, the gray value corresponding to the superimposed pixel q is obtained by the following formula:
In the formulaFor the pixel value of the pixel point q, c is the weight factor obtained in the above step,/for the pixel value of the pixel point q>The Laplacian value of the pixel q point. />The pixel point is the pixel point gray value after the pixel point q point is enhanced.
According to the above steps, the gray value of the superimposed pixel point can be obtained, and the gray value range of the superimposed pixel point is 0-255, and the gray value of the superimposed pixel point may exceed the range, so that the gray value of the superimposed pixel point is limited, i.e. normalized, and then converted into the corresponding gray value range of 0-255, which is not described in detail in the prior art.
According to the steps, the gray value of the final pixel point can be obtained, each pixel point has the corresponding final pixel point gray value, and the image formed by the final pixel point gray values of the pixel points is the enhanced image.
So far, the ROI area image is adaptively enhanced. That is, adaptive enhancement is accomplished for the ROI area image in the video image during the student exercise.
According to the steps, the self-adaptive enhancement of the ROI in the training video image in the training process of the student is completed, and when the enhanced ROI image is subjected to face and throat characteristic point extraction by using a neural network (the conventional convolutional neural network is not described in detail in the prior art), the extracted characteristic point has higher precision. And obtain the feature vector that the feature point of the picture Q in the exercise course and correspondent point in the picture of next moment make up, obtain the difference degree (namely the difference between the feature vector extracted and the feature vector of correspondent moment in the teaching video picture on the basis of the difference between the feature vector of the present exercise picture, the calculating process is the known means), and normalize the difference degree, obtain the deficiency of the student, wherein the difference degree is greater, the more needs to improve here to explain student, when the difference degree after normalization is greater than 0.05, the system proposes the corresponding improvement suggestion to student. Thus, virtual vocal music teaching assistance is completed based on artificial intelligence.
In the embodiment of the invention, firstly, real-time images of the exercise of a user by using a virtual vocal music teaching auxiliary system are collected; then, determining a key area of the real-time image according to the motion speed and the motion direction of the pixel points in the real-time image; then, determining characteristic points in each line segment according to the number of pixel points in each line segment in the key region and the movement speed; and then weighting the real-time image according to the characteristic points and the result of the Laplacian processing of the real-time image to obtain the enhanced real-time image. The result of the Laplacian processing of the real-time image is weighted according to the feature points, so that the image reinforcement of the key areas in the real-time image is realized, and the accuracy of the facial image analysis of the virtual vocal music teaching is improved.
Claims (10)
1. A facial image analysis method for virtual vocal music teaching, comprising:
collecting real-time images of a user exercising by using the virtual vocal music teaching auxiliary system;
determining a key area of the real-time image according to the motion speed and the motion direction of the pixel points in the real-time image;
determining characteristic points of each line segment in the key region according to the number of pixel points and the movement speed in each line segment in the key region;
and weighting the result of the Laplacian processing of the real-time image according to the characteristic points to obtain an enhanced real-time image.
2. The facial image analysis method for virtual vocal teaching according to claim 1, wherein the determining the key region of the real-time image according to the motion speed and the motion direction of the pixel point in the real-time image comprises:
determining a ratio between a difference value of motion speeds corresponding to two pixel points in the motion direction and a motion speed corresponding to a first pixel point according to the motion speed and the motion direction respectively corresponding to the two pixel points in the real-time image, and obtaining a first motion speed difference, wherein the two pixel points comprise a first pixel point and a second pixel point;
and when the first motion speed difference is smaller than the preset motion speed difference, executing a second motion speed difference determined according to a third pixel point and the first pixel point, and so on until the nth motion speed difference is larger than or equal to the preset motion speed difference, confirming that the first pixel point, the second pixel point and the nth pixel point belong to a motion area in a face area in the real-time image, wherein N is an integer larger than 2, and the nth motion speed difference comprises determining a ratio between a difference value of the motion speeds corresponding to the nth pixel point and the first pixel point and the motion speed corresponding to the first pixel point in the motion direction according to the motion speed and the motion direction respectively corresponding to the nth pixel point and the first pixel point.
3. The face image analysis method for virtual vocal teaching according to claim 2, wherein after confirming that the first pixel point, the second pixel point to the nth pixel point belong to a moving region in a face region in the real-time image, further comprising:
carrying out average calculation on gray values corresponding to all pixel points in the motion area, and determining average gray values;
and clustering the motion areas by a clustering method according to the average gray value to obtain a first motion area.
4. The facial image analysis method for virtual vocal teaching according to claim 3, wherein said clustering the motion areas by a clustering method according to the average gray value, after obtaining the first motion area, further comprises:
subtracting the gray values corresponding to the ith pixel point and the (i+1) th pixel point in the first motion area to obtain a first gray difference value, wherein i is an integer greater than 0;
and if the gray level difference value is smaller than the preset gray level difference value, executing a second gray level difference value determined according to the (i+2) th pixel point and the (i+N) th pixel point, and so on until the (i+N) th motion speed difference is larger than or equal to the preset motion speed difference, confirming that the (i+1) th pixel point to the (i+N) th pixel point belong to a key area in a face area in the real-time image, wherein N is an integer larger than 2, and the (i+N) th motion speed difference comprises a difference value obtained by subtracting gray level values corresponding to the (i+N) th pixel point and the (i+N) th pixel point in the first motion area.
5. The facial image analysis method for virtual vocal teaching according to claim 4, wherein the determining the feature point of each line segment in the key region according to the number of pixel points in each line segment in the key region and the movement speed comprises:
dividing a line segment into a first part and a second part according to a pixel point Z on the line segment in the key region;
determining a first function value A by using the gray value of each pixel point on the first part, the gray value of each pixel point on the second part, each movement speed of each pixel point on the first part and each movement speed of each pixel point on the second part;
determining a second function value B by using the number of first pixels, the number of second pixels and a preset positive number, wherein the number of the first pixels comprises the number of each pixel on the first part, and the number of the second pixels comprises the number of each pixel on the second part;
determining the product of the first function value A and the second function value B as a feature point optimal value YX;
and determining the maximum feature point optimal selection value YX as the feature point in the line segment according to the feature point optimal selection value YX corresponding to each pixel point Z on the line segment.
6. The facial image analysis method for virtual vocal teaching according to claim 5, wherein said determining a first function value a from the gray value of each pixel on the first portion, the gray value of each pixel on the second portion, each movement speed of each pixel on the first portion, and each movement speed of each pixel on the second portion comprises:
subtracting the gray value of each pixel point on the first part from the gray value of each pixel point on the second part to obtain a first gray difference value;
subtracting the motion speeds of the pixel points on the first part from the motion speeds of the pixel points on the second part to obtain a first motion speed difference;
and processing the first gray level difference value and the first movement speed difference value through an exponential function to determine a first function value A.
7. The facial image analysis method for virtual vocal teaching according to claim 6, wherein said processing the first gray difference value and the first movement velocity difference value by an exponential function, determining a first function value a, comprises:
according to the first function value
Determining a first function value A, wherein +.>Is the gray value of the ith pixel point in the first part of the line segment,/>Is the gray value of the ith pixel point in the second part,/for the second part>For the movement speed of the ith pixel point in the first part,/th>For the motion speed of the ith pixel point corresponding to the pixel point in the second part, +.>Dividing a horizontal line segment into two parts for the pixel point z point, wherein the number of the pixel points of the first part is +.>The number of pixels in the second portion.
8. The facial image analysis method for virtual vocal teaching according to claim 5, wherein the determining the second function value B by the first number of pixels, the second number of pixels, and the preset positive number comprises:
according to the second function valueDetermining a second function value B, wherein +.>Dividing a horizontal line segment into two parts for the pixel point z point, wherein the number of the pixel points of the first part is +.>For the number of pixels of the second part, < >>Is a preset positive number.
9. The face image analysis method for virtual vocal teaching according to claim 7 or 8, wherein the weighting processing is performed on the real-time image according to the feature points and the result of the Laplacian processing, to obtain an enhanced real-time image, comprising:
according to the q-th pixel pointThe gray value of the ith pixel point in the neighborhood and the gray value of the q point of the pixel point are used for determining the probability P that the q point is a noise point, and the q-th pixel point is the characteristic point;
according to the information richness Y, laplacian value L, probability P and/or maximum value corresponding to the q-th pixel pointDetermining a weight factor C, wherein the probability P comprises the probability that the q-th pixel point is noise, and the maximum value +.>The method comprises the steps that the maximum Laplacian value is obtained after Laplacian processing is carried out on each pixel value in the real-time image, and the information richness Y corresponding to the q-th pixel point is determined according to the motion speed of the q-th pixel point, the distance between the q-th pixel point and the characteristic point and the number of the pixel points in a first motion area where the q-th pixel point is located;
determining a pixel point gray value after the pixel point q is enhanced according to the weight factor C and the Laplacian value corresponding to the q-th pixel point。
10. The facial image analysis method for virtual vocal teaching according to claim 9, wherein the enhanced gray value is determined according to the weight factor and the Laplacian value corresponding to each pixel point in the real-time imageComprising:
by passing throughDetermining pixel gray value +_after pixel q enhancement>Wherein->The pixel value of the pixel point q is given, c is a weight factor, < ->The Laplacian value is the pixel point q.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310720312.9A CN116485794B (en) | 2023-06-19 | 2023-06-19 | Face image analysis method for virtual vocal music teaching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310720312.9A CN116485794B (en) | 2023-06-19 | 2023-06-19 | Face image analysis method for virtual vocal music teaching |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116485794A true CN116485794A (en) | 2023-07-25 |
CN116485794B CN116485794B (en) | 2023-09-19 |
Family
ID=87219823
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310720312.9A Active CN116485794B (en) | 2023-06-19 | 2023-06-19 | Face image analysis method for virtual vocal music teaching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116485794B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106780620A (en) * | 2016-11-28 | 2017-05-31 | 长安大学 | A kind of table tennis track identification positioning and tracking system and method |
CN109754377A (en) * | 2018-12-29 | 2019-05-14 | 重庆邮电大学 | A kind of more exposure image fusion methods |
WO2019223067A1 (en) * | 2018-05-25 | 2019-11-28 | 平安科技(深圳)有限公司 | Multiprocessing-based iris image enhancement method and apparatus, and device and medium |
CN110853006A (en) * | 2019-11-05 | 2020-02-28 | 华南理工大学 | Method for evaluating quality of digital pathological image acquired by scanner |
CN113521711A (en) * | 2021-07-13 | 2021-10-22 | 济南幼儿师范高等专科学校 | Dance training auxiliary system and method |
CN113723494A (en) * | 2021-08-25 | 2021-11-30 | 武汉理工大学 | Laser visual stripe classification and weld joint feature extraction method under uncertain interference source |
CN113989143A (en) * | 2021-10-26 | 2022-01-28 | 中国海洋大学 | High-precision quick focus detection method based on push-broom type underwater hyperspectral original image |
US20220392029A1 (en) * | 2019-11-14 | 2022-12-08 | Agfa Nv | Method and Apparatus for Contrast Enhancement |
CN115487959A (en) * | 2022-11-16 | 2022-12-20 | 山东济矿鲁能煤电股份有限公司阳城煤矿 | Intelligent spraying control method for coal mine drilling machine |
CN115829883A (en) * | 2023-02-16 | 2023-03-21 | 汶上县恒安钢结构有限公司 | Surface image denoising method for dissimilar metal structural member |
CN116092013A (en) * | 2023-03-06 | 2023-05-09 | 广东汇通信息科技股份有限公司 | Dangerous road condition identification method for intelligent monitoring |
-
2023
- 2023-06-19 CN CN202310720312.9A patent/CN116485794B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106780620A (en) * | 2016-11-28 | 2017-05-31 | 长安大学 | A kind of table tennis track identification positioning and tracking system and method |
WO2019223067A1 (en) * | 2018-05-25 | 2019-11-28 | 平安科技(深圳)有限公司 | Multiprocessing-based iris image enhancement method and apparatus, and device and medium |
CN109754377A (en) * | 2018-12-29 | 2019-05-14 | 重庆邮电大学 | A kind of more exposure image fusion methods |
CN110853006A (en) * | 2019-11-05 | 2020-02-28 | 华南理工大学 | Method for evaluating quality of digital pathological image acquired by scanner |
US20220392029A1 (en) * | 2019-11-14 | 2022-12-08 | Agfa Nv | Method and Apparatus for Contrast Enhancement |
CN113521711A (en) * | 2021-07-13 | 2021-10-22 | 济南幼儿师范高等专科学校 | Dance training auxiliary system and method |
CN113723494A (en) * | 2021-08-25 | 2021-11-30 | 武汉理工大学 | Laser visual stripe classification and weld joint feature extraction method under uncertain interference source |
CN113989143A (en) * | 2021-10-26 | 2022-01-28 | 中国海洋大学 | High-precision quick focus detection method based on push-broom type underwater hyperspectral original image |
CN115487959A (en) * | 2022-11-16 | 2022-12-20 | 山东济矿鲁能煤电股份有限公司阳城煤矿 | Intelligent spraying control method for coal mine drilling machine |
CN115829883A (en) * | 2023-02-16 | 2023-03-21 | 汶上县恒安钢结构有限公司 | Surface image denoising method for dissimilar metal structural member |
CN116092013A (en) * | 2023-03-06 | 2023-05-09 | 广东汇通信息科技股份有限公司 | Dangerous road condition identification method for intelligent monitoring |
Non-Patent Citations (2)
Title |
---|
BO LI等: "RGB-D Image Enhancement by Prior Guided Weighted Nonlocal Laplacian", 《2018 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC)》, pages 1675 - 1679 * |
宋禹辰: "基于计算机辅助的数字病理图像质量问题检测方法的研究", 《中国优秀硕士学位论文全文数据库基础科学辑》, pages 006 - 866 * |
Also Published As
Publication number | Publication date |
---|---|
CN116485794B (en) | 2023-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104598915B (en) | A kind of gesture identification method and device | |
CN111553837B (en) | Artistic text image generation method based on neural style migration | |
CN113158862B (en) | Multitasking-based lightweight real-time face detection method | |
CN110688965B (en) | IPT simulation training gesture recognition method based on binocular vision | |
CN112837344B (en) | Target tracking method for generating twin network based on condition countermeasure | |
CN111709301B (en) | Curling ball motion state estimation method | |
CN108681689B (en) | Frame rate enhanced gait recognition method and device based on generation of confrontation network | |
CN105913002B (en) | The accident detection method of online adaptive under video scene | |
CN110689000B (en) | Vehicle license plate recognition method based on license plate sample generated in complex environment | |
CN112270691B (en) | Monocular video structure and motion prediction method based on dynamic filter network | |
CN111931654A (en) | Intelligent monitoring method, system and device for personnel tracking | |
CN114663426B (en) | Bone age assessment method based on key bone region positioning | |
CN111310609A (en) | Video target detection method based on time sequence information and local feature similarity | |
CN114548253A (en) | Digital twin model construction system based on image recognition and dynamic matching | |
CN113516064A (en) | Method, device, equipment and storage medium for judging sports motion | |
CN103593639A (en) | Lip detection and tracking method and device | |
CN112085717A (en) | Video prediction method and system for laparoscopic surgery | |
CN118251698A (en) | Novel view synthesis of robust NERF model for sparse data | |
CN115035037A (en) | Limb rehabilitation training method and system based on image processing and multi-feature fusion | |
CN111080754A (en) | Character animation production method and device for connecting characteristic points of head and limbs | |
CN113362390A (en) | Rapid circular target positioning video processing method based on ellipse detection | |
CN116485794B (en) | Face image analysis method for virtual vocal music teaching | |
CN114782592B (en) | Cartoon animation generation method, device and equipment based on image and storage medium | |
CN108053425A (en) | A kind of high speed correlation filtering method for tracking target based on multi-channel feature | |
CN110443277A (en) | A small amount of sample classification method based on attention model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |