CN112087590A

CN112087590A - Image processing method, device, system and computer storage medium

Info

Publication number: CN112087590A
Application number: CN202010819781.2A
Authority: CN
Inventors: 包英泽
Original assignee: Beijing Dami Technology Co Ltd
Current assignee: Beijing Dami Technology Co Ltd
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2020-12-15

Abstract

The application discloses an image processing method, an image processing device, an image processing system and a computer storage medium. The method comprises the steps of obtaining an initial image; wherein the initial image comprises the target object. Determining a target object frame according to a target object in the initial image; wherein the target object box includes a target object; and the image defined by the target object frame in the initial image is the target image. Deleting the non-target image in the initial image; wherein the initial image comprises a target image and the non-target image. According to the method and the device, the target object frame can be determined for the obtained initial image, the non-target images except the target object frame in the initial image are deleted, and the image selected by the target object frame is used as the required image, so that the privacy disclosure risk caused by overlarge shooting angle of the wide-angle camera can be avoided. Meanwhile, the area of the image selected by the target object frame is smaller than that of the initial image, so that the transmission cost can be reduced, and the transmission speed is increased.

Description

Image processing method, device, system and computer storage medium

Technical Field

The present application relates to the field of image recognition technologies, and in particular, to an image processing method, an image processing apparatus, an image processing system, and a computer storage medium.

Background

At present, cameras have been widely used in different terminal devices to provide a photographing or video recording function, but due to blind areas or dead angles existing during photographing or video recording, application effects are affected. For example, in the process of video chat or network teaching, the field angle FOV of a general camera is between 40 and 60 degrees, which is too small, and brings inconvenience to the user, thereby affecting the user experience. In view of this, a common camera can be replaced by a fisheye camera or a wide-angle camera, or a wide-angle lens (FOV >90) is additionally sleeved on a restraining camera of the terminal device, so as to achieve the effect of a wide-angle camera, and further bring better experience to the user.

However, when the wide-angle camera is applied, the wide-angle camera can find that the shooting angle range of the wide-angle camera is large, so that the privacy of a user is easily disclosed, and the transmission cost is easily overlarge.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, an image processing system and a computer storage medium, wherein a target object is identified for an acquired image, a target object frame is determined according to the target object, an area except the target object frame in the acquired image is deleted, and the selected image is displayed by sending the target object frame so as to display safer image information. Meanwhile, compared with the acquired image, the selected image has smaller area, so that the transmission cost is reduced, and the method is more applicable.

In a first aspect, an embodiment of the present application provides an image processing method, including:

acquiring an initial image; wherein the initial image comprises the target object.

Determining a target object frame according to a target object in the initial image; wherein the target object box includes a target object; and the image defined by the target object frame in the initial image is the target image.

Deleting the non-target image in the initial image; wherein the initial image comprises a target image and the non-target image.

In the embodiment of the application, the target object frame can be determined for the obtained initial image, the non-target images except the target object frame in the initial image are deleted, and the image selected from the target object frame is used as the required image, so that the privacy disclosure risk caused by overlarge shooting angle of the wide-angle camera can be avoided. Meanwhile, the area of the image selected by the target object frame is smaller than that of the initial image, so that the transmission cost can be reduced, and the transmission speed is increased.

In an alternative of the first aspect, determining the target object frame according to the target object in the initial image specifically includes:

and identifying the target object in the initial image by adopting a face identification algorithm, and determining a target object frame.

In the embodiment of the application, the initial image can be specifically identified through a face identification algorithm, so that the accuracy of the target object frame is guaranteed, and the risk of privacy disclosure caused by overlarge shooting angle of the wide-angle camera is further avoided.

In yet another alternative of the first aspect, the initial image comprises a plurality of frames of images;

identifying the target object in the initial image by adopting a face recognition algorithm, and determining the target object frame specifically comprises the following steps:

selecting a target frame image from a plurality of frame images;

inputting the gray value of the target frame image into a target neural network model to obtain initial object frame data; the target neural network model is obtained by training sample gray values of a plurality of known images and sample target object frame data of corresponding images;

and processing the initial object frame data according to a preset proportion to obtain a target object frame.

In the embodiment of the application, a single frame image which is most suitable for reference is selected from the acquired images, the single frame image is input into a trained neural network model to obtain initial object frame data, and the initial object frame data is processed according to a preset proportion. The neural network model is trained according to a known sample, so that more accurate spectrum profiling of initial object frame data output according to the gray value of the selected single-frame image can be guaranteed, and meanwhile, the initial object frame can be adjusted according to a preset proportion to obtain a more appropriate target image.

In another alternative of the first aspect, the processing the initial object frame data according to the preset proportion to obtain the target object frame specifically includes:

acquiring target distance information;

determining a proportionality coefficient according to the target distance information;

and determining a target object frame according to the scale coefficient and the initial object frame data.

In the embodiment of the application, the size of the object frame is judged by specifically acquiring the distance between the user and the camera, and then a more suitable target object frame is obtained by using the coefficient corresponding to the distance and the initial data frame data, so that the user experience can be improved, and the privacy leakage risk caused by the overlarge shooting angle of the wide-angle camera is further avoided.

In yet another alternative of the first aspect, the determining the scaling factor according to the target distance information specifically includes:

determining target classification according to the target distance information;

and searching a preset classification proportion mapping list, and determining a proportion coefficient corresponding to the target classification.

In the embodiment of the application, the distance between the user and the camera is specifically acquired to judge whether the distance is short distance or long distance, and then the corresponding proportionality coefficient is obtained by searching the list, so that the proportionality coefficient has higher applicability in combination with the actual distance, and further the privacy disclosure risk caused by overlarge shooting angle of the wide-angle camera can be avoided.

In yet another alternative of the first aspect, after deleting the non-target image in the initial image, the method further includes:

acquiring a target object face characteristic point in a target image;

and searching a preset moving track corresponding to the target object face characteristic points, and moving the target object face characteristic points according to the moving track.

In the embodiment of the application, in order to solve the problem that the user experience is affected by obesity of the face of an image shot by the wide-angle camera, the face characteristic points of the target object of the target image are specifically acquired, and the face effect in the image can be improved by adjusting the corresponding face characteristic points through the preset track, so that better experience is brought to the user.

identifying a target object expression in a target image;

generating prompt information according to the target object expression;

and displaying the prompt information.

In the embodiment of the application, the current state of the user is judged by specifically identifying the expression of the target object in the target image, and the prompt information corresponding to the state is displayed to the user, so that the user can be reminded under the condition of ensuring the privacy of the user, and better experience is brought to the user.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

the acquisition module is used for acquiring an initial image; the initial image includes a target object;

the determining module is used for determining a target object frame according to a target object in the initial image; the target object frame comprises a target object; an image defined by the target object frame in the initial image is a target image;

the execution module is used for deleting the non-target image in the initial image; the initial image includes a target image and a non-target image.

In an alternative of the second aspect, the determining module is specifically configured to identify the target object in the initial image by using a face recognition algorithm, and determine the target object frame. Wherein the target object box includes a target object; and the image defined by the target object frame in the initial image is the target image.

In yet another alternative of the second aspect, the initial image includes a plurality of frames of images, and the determining module may specifically include:

the selecting unit is used for selecting a target frame image from the multi-frame images;

the calculation unit is used for inputting the gray value of the target frame image into the target neural network model to obtain initial object frame data; the target neural network model is obtained by training sample gray values of a plurality of known images and sample target object frame data of corresponding images;

and the processing unit is used for processing the initial object frame data according to a preset proportion to obtain the target object frame.

In yet another alternative of the second aspect, the processing unit may specifically include:

an acquisition element for acquiring target distance information;

a first establishing element for determining a scaling factor from the target distance information;

and a second establishing element for determining the target object frame according to the scale factor and the initial object frame data.

In yet another alternative of the second aspect, the image processing apparatus may further include:

the characteristic point acquisition module is used for acquiring the characteristic points of the face of the target object in the target image;

and the control module is used for searching a preset moving track corresponding to the target object face characteristic points and moving the target object face characteristic points according to the moving track.

the recognition module is used for recognizing the target object expression in the target image;

the generating module is used for generating prompt information according to the target object expression;

and the display module is used for displaying the prompt information.

In a third aspect, an embodiment of the present application provides an image processing apparatus, including a processor, a memory, and a communication interface; the processor is connected with the memory and the communication interface; a memory for storing executable program code; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to execute the image processing method provided by the first aspect of the embodiments of the present application or any implementation manner of the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer storage medium, where a computer program is stored, where the computer program includes program instructions, and when the program instructions are executed by a processor, the image processing method provided by the first aspect of the present application or any implementation manner of the first aspect of the present application may be implemented.

In a fifth aspect, the present application provides a computer program product, which when run on an image processing apparatus, causes the image processing apparatus to execute the image processing method provided by the first aspect of the present application or any implementation manner of the first aspect.

It should be understood that the image processing apparatus provided by the third aspect, the computer storage medium provided by the fourth aspect, and the computer program product provided by the fifth aspect are all configured to execute the image processing method provided by the first aspect, and therefore, the beneficial effects achieved by the image processing apparatus provided by the first aspect can be referred to and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic diagram of an architecture of an image processing system according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of an image processing method according to an embodiment of the present application;

fig. 3 is a schematic diagram of an initial image and a target image according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of another image processing method according to an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating a method for determining target object frame data in an initial image according to an embodiment of the present disclosure;

fig. 6 is a schematic flowchart of another image processing method according to an embodiment of the present application;

fig. 7 is a schematic diagram of cheek feature point movement according to an embodiment of the present disclosure;

fig. 8 is a schematic flowchart of another image processing method according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Referring to fig. 1, fig. 1 is a schematic diagram of an architecture of an image processing system according to an embodiment of the present disclosure.

As shown in fig. 1, the image processing system may include a first terminal cluster, a server 20, and a second terminal cluster.

The first terminal cluster may be a student terminal, and specifically includes one or more student terminals, where the plurality of student terminals may include a student terminal 10a, a student terminal 10b, a student terminal 10c …, and so on. The first terminal cluster can be provided with student software for realizing the functions of on-line learning, submitting learning jobs and the like of students, wherein the specific software can be nailing, Tencent classroom and the like. Any student in the first terminal cluster can establish a data relationship with the network, and establish a data connection relationship with the server 20 through the network, for example, sending or receiving images, voice, files, and the like. Any student end in the first terminal cluster can be but is not limited to a mobile phone, a tablet computer, a notebook computer and other equipment provided with student version software. It should be noted that, in the embodiment of the present application, a wide-angle camera may be installed at any student end. The wide-angle camera may be used to acquire an original image containing the student. The student end can cut the original image and send the processed image to other terminals.

The second terminal cluster may be a teacher terminal, and specifically includes one or more teacher terminals, where a plurality of teacher terminals may include teacher terminal 30a, teacher terminal 30b, teacher terminal 30c …, and so on. And teacher software can be installed in the second terminal cluster and used for realizing functions of teachers on-line teaching, reading and amending homework and the like, wherein the specific software can be nailing, Tencent classroom and the like. Any teacher end in the second terminal cluster can establish a data relationship with the network, and establish a data connection relationship with the server 20 through the network, for example, sending or receiving images, voice, files, and the like. Any teacher end in the second terminal cluster can be but is not limited to a mobile phone, a tablet computer, a notebook computer and other equipment provided with teacher software.

The network may be a medium providing a communication link between any student side in the first terminal cluster and the server 20 or between any teacher side in the second terminal cluster and the server 20, and may also be the internet including network devices and transmission media, without being limited thereto. The transmission medium may be a wired link (such as, but not limited to, coaxial cable, fiber optic cable, and Digital Subscriber Line (DSL), etc.) or a wireless link (such as, but not limited to, wireless fidelity (WIFI), bluetooth, and mobile device network, etc.).

The server 20 may be a server capable of providing multiple services, and may receive data such as images, voices and files sent by any student end in the network or the first terminal cluster, or send data such as images, voices and files sent by any teacher end in the second terminal cluster to any student end in the network or the first terminal cluster; the method can also receive data such as images, voice, files and the like sent by any teacher end in the network or the second terminal cluster, or send data such as images, voice, files and the like sent by any student end in the first terminal cluster to any teacher end in the network or the second terminal cluster. The server 20 may be, but is not limited to, a hardware server, a virtual server, a cloud server, and the like.

It will be appreciated that the number of first terminal clusters, servers 20, and second terminal clusters in the image processing system shown in fig. 1 is by way of example only, and that the image processing system may include any number of student terminals, teacher terminals, and servers in a particular implementation. The embodiments of the present application do not limit this. For example, and without limitation, server 20 may be a server cluster comprised of a plurality of servers.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an image processing method according to an embodiment of the present disclosure.

As shown in fig. 2, the image processing method may include:

step 201, acquiring an initial image.

Wherein the initial image comprises the target object. Specifically, the target object may be a student who is learning online.

Specifically, the student may be photographed by the front camera to acquire an initial image in response to the on-line learning instruction of the student. Wherein the initial image may include several background objects including the student, such as a table, a sofa, a bookcase, etc.

The front camera is a camera whose field angle is larger than a certain threshold (for example, but not limited to, 90 degrees).

Optionally, the front camera may be a wide-angle camera.

Optionally, the front camera can be a common camera, and a wide-angle lens is sleeved on the common camera.

It is possible that the initial image acquired may consist of one or more single frame images.

Illustratively, on-line learning is taken as an example. The student logs in account (APP here can but not be limited to software such as nailing) through the APP of mobile terminal, clicks and gets into behind the video conferencing interface and obtains the shooting image that has this student through this mobile terminal's leading camera. It should be noted that the display interface of the mobile terminal for the student to log in the account may include a plurality of image frames with different users, wherein the different users may include lecture teachers, other students, and the like.

Step 202, determining a target object frame according to the target object in the initial image.

Specifically, the target object frame includes a target object; and the image defined by the target object frame in the initial image is the target image. Wherein the target object may be a student who is learning online.

Specifically, the position and size of the target object frame may be determined according to the position and size of the target object in the initial image, and the image defined by the target object frame includes the target object and a background image around the target object. It should be noted that the size of the image selected by the target object frame is smaller than the size of the initial image.

Possibly, the target object box may coincide with a minimum bounding box of the target object.

Possibly, the target object box may coincide with a minimum bounding box center of the target object, with an area n times (n may be, but is not limited to, 2) the minimum bounding box.

It is possible that a vertex of the target image box, which may be, but is not limited to, the top left vertex of the target image box, coincides with the same vertex of the smallest bounding box of the target object.

And step 203, deleting the non-target image in the initial image.

Specifically, the initial image includes the target image and a non-target image.

Specifically, after the target image is selected from the initial image by the target object frame, the images except the target image in the initial image are deleted to obtain the required target image. And then the target image can be sent to a server according to the requirement and then sent to the user side by the server. The user terminal can be a teacher terminal and/or other student terminals.

For example, taking student online learning as an example, the schematic diagram of acquiring the target image shown in fig. 3 can be referred to. The student can log in the account number on the nail of the mobile terminal and click to enter the video conference interface. The mobile terminal obtains an initial image with the student by turning on a front camera, wherein the student is positioned in the middle of the initial image. And inputting the initial image into a trained neural network model to obtain a target object frame, specifically, the target object frame is superposed with the center of the minimum circumscribed frame containing the students in the initial image, and the area of the target object frame is 2 times of that of the minimum circumscribed frame. And deleting non-target images except the target image defined by the target object frame in the initial image to obtain a final target image, and sending the target image to a server and sending the target image to a teacher end through the server.

The minimum circumscribed frame can be a rectangular frame with the inner wall of the frame tangent to the edge of the target object, so as to ensure that the area of the rectangular frame is minimum. Possibly, the minimum bounding box may also be a circular box with an inner wall tangent to the edge of the target object.

In the image processing method shown in fig. 2, a target object frame is determined for the acquired initial image, non-target images other than the target object frame in the initial image are deleted, and an image selected from the target object frame is used as a required image, so that the risk of privacy disclosure caused by an overlarge shooting angle of the wide-angle camera can be avoided. Meanwhile, the area of the image selected by the target object frame is smaller than that of the initial image, so that the transmission cost can be reduced, and the transmission speed is increased.

Referring to fig. 4, fig. 4 is a schematic flowchart illustrating an image processing method according to an embodiment of the present disclosure.

As shown in fig. 4, the image processing method may include:

step 401, acquiring an initial image.

Specifically, the student may be photographed by the front camera to acquire an initial image in response to the on-line learning instruction of the student. Wherein the initial image includes several background objects including the student, such as a table, a sofa, a bookcase, etc.

The front-facing camera is a camera whose field angle is larger than a certain threshold (for example, but not limited to, the threshold is 90 degrees). Optionally, the front camera may be a wide-angle camera.

It is possible that the acquired initial image may be composed of a plurality of single frame images.

Step 402, selecting a target frame image from the plurality of frame images.

Specifically, the initial image acquired by the front camera is a series of pictures or a video, which is composed of a plurality of continuous single-frame images. One of the continuous single-frame images can be selected as a target frame image, and the selection basis can be to judge the position of the target object in each target frame image. Specifically, the selection mode may determine the position of the target object in the image by inputting the image into the trained neural network model, and select the image of the target object closest to the middle of the image as the target frame image. The neural network model is obtained by training gray values of a plurality of known images and position data values of target objects in corresponding images. Specifically, an image of the target object closest to the center of the image may be selected as the target frame image.

And 403, inputting the gray value of the target frame image into the target neural network model to obtain initial object frame data.

Specifically, the target neural network model is a model obtained by training sample gray values of a plurality of known images and sample target object frame data corresponding to the images.

Specifically, after the target frame image is selected, the target frame image may be preprocessed, that is, the target frame image whose pixels are color is processed into black and white pixels. And taking the gray value of the processed target frame image as the input of the target neural network model, and outputting the initial data of the target object frame. Wherein the initial data may correspond to one or more target points of the target object box.

Possibly, the initial data corresponds to a target point, and the selection of the initial data can establish a rectangular coordinate system as a reference by taking a certain vertex of the initial image as an origin, wherein the transverse edge of two adjacent edges is taken as an x-axis, and the longitudinal edge of the two adjacent edges is taken as a y-axis. The initial data may include (x, y, w, h), x and y representing specific coordinates in a rectangular coordinate system, and w and h representing the length and width of the target object frame, respectively.

Specifically, the target point may correspond to four vertices or a center point of the target object frame. When the target point is any one of the four vertexes of the target object frame, x and y represent coordinates of the target point in the corresponding rectangular coordinate system, and w and h represent the length and width of the target object frame, respectively. When the target point is the center point of the target object frame, x and y represent the coordinates of the target point in the corresponding rectangular coordinate system, w represents the wide vertical distance from the target point to the target object frame, and h represents the long vertical distance from the target point to the target object frame.

Possibly, the initial data corresponds to a plurality of target points, and the selection of the initial data can establish a rectangular coordinate system as a reference by taking a certain vertex of the initial image as an origin, wherein the transverse edge of two adjacent edges is taken as an x-axis, and the longitudinal edge of the two adjacent edges is taken as a y-axis. The initial data may include (x, y), with x and y representing specific coordinates in a rectangular coordinate system.

Specifically, the initial data may include four vertex coordinates (x, y) of the target object box. The mobile terminal can determine the target object frame through the four vertexes. It can be known that the mobile terminal can obtain the length of the target object frame through two vertices located at the length of the target object frame, and obtain the width of the target object frame through two vertices located at the width of the target object frame.

It should be noted that the target neural network model here may be a cascade regression model based on a neural network structure, and the regression model directly learns the mapping function from input to output, does not need complex modeling, is simple and efficient, and specifically may perform prediction by preprocessing data and then by feature extraction and regression algorithm.

It should be further noted that the target object is included in the image defined by the initial object frame.

And step 404, processing the initial object frame data according to a preset proportion to obtain a target object frame.

In the implementation of the application, new object frame data can be determined according to the distance from a user to a front camera of the mobile terminal, and an object frame corresponding to the new object frame data can be used as a target object frame. Specifically, when the user is closer to the front camera of the mobile terminal, the amplification scale of the target object frame on the basis of the initial object frame is larger; when the user is farther away from the front camera of the mobile terminal, the amplification scale of the target object frame on the basis of the initial object frame is smaller.

And possibly, amplifying the length and the width in the initial object frame according to a preset proportion to obtain the target object frame.

Possibly, after the length and the width in the initial object frame are amplified according to a preset proportion, the amplified initial object frame can be translated to obtain the target object frame.

Specifically, when the target point of the target object frame is selected as a certain vertex (which may be but is not limited to the lower left), after the length and the width in the initial object frame are enlarged by n times (which may be but is not limited to 2 times) according to a preset ratio, the target point needs to be moved to the left by half of the length and moved to the lower by half of the width.

For example, a schematic diagram of determining the target object box data shown in fig. 5 may be referred to. And establishing a rectangular coordinate reference system by taking a certain vertex of the initial image as an origin, taking a transverse edge of two adjacent edges as an x axis and a longitudinal edge as a y axis, and taking the central point of the initial object frame as a target point to obtain initial data (6, 8, 3, 3). It is understood that the center point is located at the center of the initial object frame, and the center point does not need to be moved after the object frame is enlarged with the center point as the center (the position of the target object in the object frame is not moved). Possibly, a proportionality coefficient a can be determined according to the distance between the user and the front camera, and the new object frame data (6, 8, 3a, 3a) can be obtained by multiplying the width and the width of the vertical distance between the target point and the width of the target object frame and the length of the initial object frame by the proportionality coefficient a. The new object frame has a length of 6a and a width of 6 a. The object frame corresponding to the new object frame data may be used as the target object frame.

The manner of determining the scaling factor according to the distance between the user and the camera may include, but is not limited to, the following steps:

step 1: and acquiring target distance information.

Specifically, the target distance information is a distance between the target object and a front camera of the mobile terminal.

Possibly, the target distance information can be obtained by the mobile terminal through a front camera with an initial image of the user. Specifically, the target distance information may be determined by analyzing an area ratio of the target object in the initial image.

Possibly, the target distance information may be acquired by the mobile terminal through an infrared ranging sensor provided in the built-in mobile terminal.

Step 2: and determining a proportionality coefficient according to the target distance information.

Possibly, when the distance between the user and the front camera is in different preset ranges, different scale factors can be corresponded.

In particular, different preset ranges may correspond to different classifications. Different classifications may correspond to different scaling factors. The classification may include, for example, but is not limited to, near distance and far distance. Illustratively, the short-range corresponding preset range may be (0, 25 cm), and the long-range corresponding preset range may be greater than or equal to 25 cm.

Specifically, after determining the target classification corresponding to the distance according to the distance between the user and the camera, the mobile terminal searches a preset classification scale mapping table to determine a scale coefficient corresponding to the target classification. The preset classification proportion mapping table comprises different classifications and proportion coefficients corresponding to the classifications.

Illustratively, when it is detected that the acquired distance between the user and the camera is less than 25 cm, the target is classified as a close range, and a scale factor of 2 is obtained by looking up a classification scale mapping table shown in table 1 below. When the obtained distance between the user and the camera is detected to be greater than or equal to 25 cm, the target is classified as a long distance, and a classification proportion mapping table shown in the following table 1 is searched to obtain a proportion coefficient of 1.5.

TABLE 1 Classification ratio mapping Table

Classification	Ratio of
		Close range	2
Remote distance	1.5

It is possible that the distance between the user and the camera is inversely proportional to the scaling factor. That is, the farther the distance between the user and the camera is, the smaller the proportionality coefficient corresponding to the distance is; the closer the distance between the user and the camera is, the larger the scale factor corresponding to the distance is.

And step 405, deleting the non-target image in the initial image.

Referring to fig. 6, fig. 6 is a schematic flowchart illustrating an image processing method according to an embodiment of the present disclosure.

As shown in fig. 6, the image processing method may include:

step 601, acquiring an initial image.

Specifically, step 601 is identical to step 201 or step 401, and is not described herein again.

Step 602, selecting a target frame image from the plurality of frame images.

Specifically, step 602 is identical to step 402, and is not described herein again.

Step 603, inputting the gray value of the target frame image into the target neural network model to obtain initial object frame data.

Specifically, step 603 is identical to step 403, and is not described herein again.

And step 604, processing the initial object frame data according to a preset proportion to obtain a target object frame.

Specifically, step 604 is identical to step 404, and is not described herein again.

And step 605, deleting the non-target image in the initial image.

Specifically, step 605 is consistent with step 405 or step 203, and is not described herein again.

And step 606, acquiring the facial feature points of the target object in the target image.

Specifically, the target image may be input into a neural network model based on a generative confrontation network to obtain face feature detection points of the target object. The neural network model based on the generative confrontation network can generate a face geometric feature map according to an input face image, and then the face geometric feature map is used for predicting corresponding face key points.

The generation network of the neural network model based on the generative confrontation network adopts an encoding-decoding model, one part is that an encoder is used for predicting key points in the human face and key points of the contour, and the other part is that a decoder is used for generating a geometric feature map of the human face according to the predicted key points. And then, the generated face geometric feature graph is sent to a discrimination network, the discrimination network can judge whether the input face geometric figure is real or not, and corresponding face key points are predicted according to the given face geometric feature graph. The predicted face key points may include 72 key points including eyes, cheeks, nose, mouth, chin, etc.

Step 607, searching a preset moving track corresponding to the target object face feature point, and moving the target object face feature point according to the moving track.

Specifically, a preset movement track is provided corresponding to each target object feature point, and the target object face feature point can be moved according to the preset movement track. Possibly, when the target object feature point is a cheek, a certain feature point of the cheek feature points may be translated to the side of the mouth by a preset distance along the plane of the face, in other words, a rectangular plane coordinate system is established with the plane of the face, and the cheek feature point coordinates are translated from (x1, y1) to (x1+ n, y1), where n is the preset movement track distance of the cheek feature point. It should be noted that the target object feature point may be, but is not limited to, a point, and the preset movement trajectory may not be directed to the target object feature point, and may be a pixel point around the target object feature point.

For example, taking the selected target object feature point as the cheek feature point, referring to fig. 7, a schematic diagram of the movement of the cheek feature point may be shown, according to a preset movement track, a pixel point c1 on the contour that is separated from the cheek feature point by a fixed distance may be translated to the right at c2, so that the contour of the left half portion is closer to the right (i.e., the contour of the left cheek with the face reduced). The right half contour of the cheek feature point is correspondingly adjusted in a matching mode, so that the whole cheek can be reduced, and the improvement on the face of the target object in the target image is further achieved.

Possibly, when the target object feature point is an eye, the eye feature point can be moved in the vertical direction along the plane where the face is located to the central axis of the eye feature point in the horizontal direction. For example, a plane rectangular coordinate system is established by using a plane where a human face is located, and a certain feature point (x2, y2) in the upper half of the eye feature points is translated to (x2, y2+ a) along the central axis of the eye feature points in the horizontal direction, wherein a is a preset movement track distance of the feature point. It should be noted that the target object feature point may be, but is not limited to, a point, and the preset movement trajectory may not be directed to the target object feature point, and may be a pixel point around the target object feature point.

Referring to fig. 8, fig. 8 is a schematic flowchart illustrating a further image processing method according to an embodiment of the present disclosure.

As shown in fig. 8, the image processing method may include:

step 801, acquiring an initial image.

Specifically, step 801 is identical to step 201 or step 401, and is not described herein again.

Step 802, selecting a target frame image from the plurality of frame images.

Specifically, step 802 is identical to step 402, and is not described herein again.

Step 803, inputting the gray value of the target frame image into the target neural network model to obtain initial object frame data.

Specifically, step 803 is identical to step 403, and will not be described herein again.

And 804, processing the initial object frame data according to a preset proportion to obtain a target object frame.

Specifically, step 804 is identical to step 404, and is not described herein again.

And step 805, deleting the non-target image in the initial image.

Specifically, step 805 is identical to step 405 or step 203, and is not described herein again.

And 806, identifying the target object expression in the target image.

Specifically, the target image may be input into the trained deep neural network model to obtain an object frame attached with an expression prompt, and the object frame defines the target object in the target image. The expression prompt can include natural expressions such as happiness and smile or exaggerated expressions such as surprise, sadness, anger, disgust and fear.

It should be noted that the deep mental network model can be obtained by training a plurality of known sample gray values of the image and a sample object frame with expression prompts corresponding to the image.

Specifically, the target image may be subjected to face detection and selection in the neural network model, and then the selected facial expression is recognized by using geometric characteristics of facial features and a support vector method.

Possibly, if the face is not detected in the recognition process, the prompt information can be directly generated. Wherein the reminder information includes "user is not present" or "user is busy".

And step 807, generating prompt information according to the target object expression.

Specifically, the state of the target object can be judged according to the expression of the target object, and then prompt information is generated.

Specifically, if the target object expression is any one of exaggerated expressions such as sadness, anger, disgust and fear, it indicates that the user is not listening seriously at present, and prompt information for reminding the user to listen seriously is generated. Possibly, the prompt message may include "please attend the class seriously", "attend the class seriously, keep on", etc.

And 808, displaying the prompt message.

Specifically, the user may be presented with prompt information generated according to the user's expression. The type of the prompt message may include voice, text, animation, etc. Specifically, the type of displaying the prompt message may correspondingly include voice playing prompt, text circulation prompt, animation prompt, and the like.

Illustratively, taking online learning of students as an example, the expressions of the students in the target images acquired by the front-facing camera are recognized, and when the expressions of the students in the current learning process are detected to be smiling, a text prompt message of 'listening to class seriously and continuing to keep' can be correspondingly generated, and the text prompt message is displayed at the bottom of the current interface of the mobile terminal in a rolling manner.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 9, the image processing apparatus 900 includes an acquisition module 901, a determination module 902, and an execution module 903, wherein the details of each module are as follows:

an obtaining module 901 configured to obtain an initial image.

Specifically, the initial image includes the target object.

A determining module 902, configured to determine a target object frame according to a target object in the initial image.

Specifically, the target object frame includes a target object; and the image defined by the target object frame in the initial image is the target image.

And an executing module 903, configured to delete the non-target image in the initial image.

Specifically, the initial image includes a target image and a non-target image.

As an optional implementation manner, the determining module 902 is specifically configured to identify the target object in the initial image by using a face recognition algorithm, and determine the target object box. Wherein the target object box includes a target object; and the image defined by the target object frame in the initial image is the target image.

As an optional implementation, the initial image includes a multi-frame image, and the determining module 902 may specifically include:

and the selecting unit is used for selecting the target frame image from the multi-frame images.

The calculation unit is used for inputting the gray value of the target frame image into the target neural network model to obtain initial object frame data; the target neural network model is obtained by training sample gray values of a plurality of known images and sample target object frame data of corresponding images.

As an optional implementation, the image processing apparatus may further include:

and the characteristic point acquisition module is used for acquiring the characteristic points of the face of the target object in the target image.

and the recognition module is used for recognizing the target object expression in the target image.

And the generating module is used for generating prompt information according to the target object expression.

And the display module is used for displaying the prompt information.

It should be noted that the image processing apparatus can be used for executing the image processing method provided above, and therefore, the beneficial effects achieved by the image processing apparatus can refer to the beneficial effects in the image processing method provided above, and are not described herein again.

Referring to fig. 10, fig. 10 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present disclosure, where the image processing apparatus 1000 may include: at least one processor 1001, e.g., a CPU, at least one network interface 1005, a user interface 1004, a memory 1002, at least one communication bus 1003 and a display screen. The communication bus 1003 is used to implement connection communication among these components. The user interface 1004 may include, but is not limited to, a touch screen, a keyboard, a mouse, a joystick, and the like. The network interface 1005 may optionally include a standard wired interface or a wireless interface (e.g., WIFI interface or bluetooth interface), and a communication connection may be established with the server through the network interface 1005. The memory 1002 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. As shown in fig. 10, the memory 1002, which is a type of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and program instructions.

It should be noted that the network interface 1005 may be connected to an acquirer, a transmitter, or other communication module, and the other communication module may include, but is not limited to, a WiFi module, a bluetooth module, and the like, and it is understood that the image processing apparatus 1000 in the embodiment of the present application may also include an acquirer, a transmitter, and other communication module, and the like.

The processor 1001 may be used to call program instructions stored in the memory 1002 to perform the methods provided by the embodiments shown in fig. 2 or fig. 4 or fig. 6 or fig. 8.

Embodiments of the present application also provide a computer-readable storage medium having stored therein instructions, which when executed on a computer or processor, cause the computer or processor to perform one or more steps of any one of the methods described above. The respective constituent modules of the image processing apparatus described above may be stored in the computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. And the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks. The technical features in the present examples and embodiments may be arbitrarily combined without conflict.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. An image processing method, comprising:

acquiring an initial image; the initial image comprises a target object;

determining a target object frame according to the target object in the initial image; the target object box comprises the target object; an image defined by the target object frame in the initial image is a target image;

deleting non-target images in the initial image; the initial image includes the target image and the non-target image.

2. The method of claim 1, wherein the determining a target object frame from the target object in the initial image comprises:

and identifying the target object in the initial image by adopting a face identification algorithm, and determining the target object frame.

3. The method of claim 2, wherein the initial image comprises a plurality of frame images;

the identifying the target object in the initial image by adopting the face recognition algorithm and the determining the target object frame specifically comprise:

selecting a target frame image from the multi-frame images;

inputting the gray value of the target frame image into a target neural network model to obtain initial object frame data; the target neural network model is obtained by training sample gray values of a plurality of known images and sample target object frame data corresponding to the images;

and processing the initial object frame data according to a preset proportion to obtain the target object frame.

4. The method according to any one of claims 3, wherein the processing the initial object frame data according to the preset proportion to obtain the target object frame specifically comprises:

acquiring target distance information;

and determining the target object frame according to the scale coefficient and the initial object frame data.

5. The method according to claim 4, wherein the determining a scaling factor according to the target distance information specifically comprises:

determining target classification according to the target distance information;

6. The method according to any one of claims 1-5, wherein the deleting of the non-target image in the initial image further comprises:

acquiring a target object face characteristic point in the target image;

and searching a preset moving track corresponding to the target object face characteristic point, and moving the target object face characteristic point according to the moving track.

7. The method according to any one of claims 1-5, wherein the deleting of the non-target image in the initial image further comprises:

identifying a target object expression in the target image;

generating prompt information according to the target object expression;

and displaying the prompt information.

8. An image processing apparatus characterized by comprising:

the acquisition module is used for acquiring an initial image; the initial image comprises a target object;

a determining module, configured to determine a target object frame according to the target object in the initial image; the target object box comprises the target object; an image defined by the target object frame in the initial image is a target image;

the execution module is used for deleting the non-target image in the initial image; the initial image includes the target image and the non-target image.

9. An image processing apparatus comprising a processor, a memory, and a communication interface;

the processor is connected with the memory and the communication interface;

the memory for storing executable program code;

the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory for performing the method of any one of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.