CN111753813A

CN111753813A - Image processing method, device, equipment and storage medium

Info

Publication number: CN111753813A
Application number: CN202010796086.9A
Authority: CN
Inventors: 贺思颖
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2020-10-09

Abstract

The application discloses an image processing method, an image processing device, image processing equipment and a storage medium, belongs to the technical field of image processing, and particularly relates to a cloud technology. The embodiment of the application confirms that the discrete distance between the human face and the image acquisition equipment in the image is adopted, the discrete distance is adopted for representing, the distance between the human face and the image acquisition equipment does not need to be accurately calculated, the distance between the human face and the image acquisition equipment is estimated, the accurate calculation process of the distance is omitted, a large amount of calculated amount is omitted, the image processing efficiency can also be improved, the distance estimation is carried out on the acquired image, the target function can be realized, other components do not need to be equipped, and the manufacturing cost of the equipment can be reduced.

Description

Image processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, an image processing device, and a storage medium.

Background

With the development of image processing technology and the application of artificial intelligence, the terminal can process the image by itself, extract some implicit information in the image and provide corresponding functions according to the information. For example, in some monitoring scenarios, the terminal can perform target detection on the acquired image, determine the position of the target, and track the target. For another example, it is possible to determine to capture a user image in some scenes, determine the distance from the user to the screen, and provide some convenience according to the distance.

At present, an image processing method generally acquires an image through a camera, performs a series of geometric calculations after performing face detection on the image to obtain a distance between a face and the image, and displays corresponding information according to the distance. However, the face detection algorithm generally has a multi-candidate-box screening process, and is high in complexity and large in calculation amount, so that the processing efficiency is low.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, an image processing apparatus and a storage medium, which can achieve the effects of saving a large amount of calculation, improving the image processing efficiency and reducing the manufacturing cost of the image processing apparatus. The technical scheme is as follows:

in one aspect, an image processing method is provided, and the method includes:

carrying out face detection on the acquired image;

when the image is detected to comprise a face, obtaining distance indicating information corresponding to the image according to the proportion of a face region in the image, wherein the distance indicating information is used for indicating the distance between the face and image acquisition equipment, and the distance is a discrete distance;

and controlling the image acquisition equipment to execute a target function according to the distance indication information.

In one aspect, an image processing apparatus is provided, the apparatus including:

the detection module is used for carrying out face detection on the acquired image;

the acquisition module is used for acquiring distance indication information corresponding to the image according to the proportion of a face area in the image when the image is detected to comprise a face, wherein the distance indication information is used for indicating the distance between the face and image acquisition equipment, and the distance is a discrete distance;

and the control module is used for controlling the image acquisition equipment to execute a target function according to the distance indication information.

In a possible implementation manner, the obtaining module is configured to classify distances between faces in the image and an image acquisition device according to a proportion of a face detection frame in the image, so as to obtain distance indication information corresponding to the image.

In a possible implementation manner, the detection module and the acquisition module are configured to input the image into an image processing model, perform face detection on the image by the image processing model, classify a distance between a face in the image and an image acquisition device according to a proportion of a face detection frame obtained by the face detection in the image when the image is detected to include the face, and output distance indication information corresponding to the image.

In one possible implementation, the training process of the image processing model includes:

acquiring a sample face image, wherein the sample face image carries corresponding target distance indication information;

inputting the sample face image into the image processing model, classifying the distance between the face in the sample face image and the image acquisition equipment by the image processing model, and outputting prediction distance indication information corresponding to the sample face image;

acquiring prediction accuracy based on the predicted distance indication information and the target distance indication information;

based on the prediction accuracy, model parameters of the image processing model are adjusted until a target condition is met.

In one possible implementation, the acquiring a sample face image includes:

carrying out face detection on at least two sample images to obtain face detection frames of the at least two sample images;

acquiring the proportion of the face detection frames of the at least two sample images in the at least two sample images;

determining a cutting frame corresponding to the at least two sample images according to the proportion;

and cutting the at least two sample images according to the cutting frame, determining the cut sample images as sample face images, and determining target distance indication information corresponding to the sample face images according to the size relation between a face detection frame and the cutting frame in the sample images.

In one possible implementation manner, the determining, according to the ratio, a crop box corresponding to the at least two sample images includes:

in response to the fact that the proportion is larger than a first proportion threshold value, determining the sizes of the cutting frames corresponding to the at least two sample images according to the sizes of the face detection frames in the at least two sample images, and determining the positions of the cutting frames corresponding to the at least two sample images according to the positions of the face detection frames and the target offset, wherein the cutting frames are smaller than the face detection frames;

and responding to the fact that the proportion is smaller than a second proportion threshold value, determining the sizes of the cutting frames corresponding to the at least two sample images according to the sizes of the face detection frames in the at least two sample images and a target scaling coefficient, and randomly determining the positions of the cutting frames in the at least two sample images according to the sizes of the cutting frames in the at least two sample images, wherein the cutting frames are larger than the face detection frames.

In one possible implementation manner, the determining, according to the position of the face detection frame and the target offset, the position of the crop box corresponding to the at least two sample images includes:

and for a sample image, determining the vertex position of at least one cutting box corresponding to the at least two sample images according to the central point position of the face detection box and the target offset in at least one offset direction.

In a possible implementation manner, the determining, according to a size relationship between a face detection frame and the crop frame in a sample image, target distance indication information corresponding to the sample face image includes:

in response to that the cutting frame is smaller than the face detection frame, determining that target distance indication information corresponding to the sample face image is first distance indication information, wherein the distance indicated by the first distance indication information is smaller than a distance threshold;

and in response to the fact that the cutting frame is larger than the face detection frame, determining that the distance indication information corresponding to the target scaling coefficient is the target distance indication information corresponding to the at least two sample images.

In one possible implementation, the acquiring a sample face image includes:

carrying out face detection on at least two sample face images to obtain face detection frames of the at least two sample face images;

acquiring the proportion of the face detection frames of the at least two sample face images in the at least two sample face images;

and acquiring distance indication information corresponding to the proportion as target distance indication information corresponding to the at least two sample face images according to the corresponding relation between the proportion and the distance indication information.

In one possible implementation, the control module is configured to perform any of the following:

in response to the fact that the distance indicating information is first distance indicating information, controlling a screen of the image acquisition equipment to be lightened, and displaying a target interface, wherein the distance indicated by the first distance indicating information is smaller than a distance threshold value;

and responding to the first distance indication information, and displaying prompt information, wherein the prompt information is used for prompting that the distance from the image acquisition equipment to the current position is too small, and the distance indicated by the first distance indication information is smaller than a distance threshold value.

In one aspect, an electronic device is provided that includes one or more processors and one or more memories having at least one program code stored therein, the at least one program code being loaded into and executed by the one or more processors to implement various alternative implementations of the above-described image processing method.

In one aspect, a computer-readable storage medium is provided, in which at least one program code is stored, which is loaded and executed by a processor to implement various alternative implementations of the image processing method described above.

In one aspect, a computer program product or computer program is provided that includes one or more program codes stored in a computer-readable storage medium. One or more processors of the electronic device can read the one or more program codes from the computer-readable storage medium, and the one or more processors execute the one or more program codes, so that the electronic device can execute the image processing method of any one of the above possible embodiments.

The embodiment of the application processes the collected image, analyzes the proportion of the face region in the image, determines the discrete distance between the face and the image collecting equipment in the image, adopts the discrete distance to represent, does not need to accurately calculate the distance between the face and the image collecting equipment, and estimates the distance between the face and the image collecting equipment.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to be able to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of an image processing method according to an embodiment of the present application;

fig. 2 is a flowchart of an image processing method provided in an embodiment of the present application;

fig. 3 is a flowchart of an image processing method provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic diagram illustrating a conversion manner of a short-distance conversion module 0 and a short-distance conversion module 1 according to an embodiment of the present application;

FIG. 6 is a diagram illustrating a remotely determined crop box provided by an embodiment of the present application;

fig. 7 is a schematic diagram of a process for acquiring a sample face image according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a process of image processing using an image processing model according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 10 is a block diagram of a terminal according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The terms "first," "second," and the like in this application are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it should be understood that "first," "second," and "nth" do not have any logical or temporal dependency or limitation on the number or order of execution. It will be further understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first image can be referred to as a second image, and similarly, a second image can be referred to as a first image without departing from the scope of various described examples. The first image and the second image can both be images, and in some cases, can be separate and distinct images.

The term "at least one" is used herein to mean one or more, and the term "plurality" is used herein to mean two or more, e.g., a plurality of packets means two or more packets.

It is to be understood that the terminology used in the description of the various described examples herein is for the purpose of describing particular examples only and is not intended to be limiting. As used in the description of the various described examples and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. The term "and/or" is an associative relationship that describes an associated object, meaning that three relationships can exist, e.g., a and/or B, can mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present application generally indicates that the former and latter related objects are in an "or" relationship.

It should also be understood that, in the embodiments of the present application, the size of the serial number of each process does not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

It should also be understood that determining B from a does not mean determining B from a alone, but can also determine B from a and/or other information.

It will be further understood that the terms "Comprises," "Comprising," "inCludes" and/or "inCluding," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also understood that the term "if" may be interpreted to mean "when" ("where" or "upon") or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined." or "if [ a stated condition or event ] is detected" may be interpreted to mean "upon determining.. or" in response to determining. "or" upon detecting [ a stated condition or event ] or "in response to detecting [ a stated condition or event ]" depending on the context.

The following is a description of terms involved in the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image Recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition), video processing, video semantic understanding, video content/behavior Recognition, Three-Dimensional object reconstruction, Three-Dimensional (3D) technology, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and further include common biometric technologies such as face Recognition and fingerprint Recognition.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to the computer vision technology of artificial intelligence, the machine learning technology and the like, and is specifically explained by the following embodiment.

The following describes an embodiment of the present application.

Fig. 1 is a schematic diagram of an implementation environment of an image processing method according to an embodiment of the present application. The implementation environment includes a terminal 101, or the implementation environment includes a terminal 101 and an image processing platform 102. The terminal 101 is connected to the image processing platform 102 through a wireless network or a wired network.

The terminal 101 can be at least one of a smart phone, a game console, a desktop computer, a tablet computer, an electronic book reader, an MP3(Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3) player or an MP4(Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4) player, a laptop computer, an intelligent robot, and a self-service payment device. The terminal 101 is installed and running with an application program supporting image processing, which can be, for example, a system application, an instant messaging application, a news push application, a shopping application, an online video application, a social application.

Illustratively, the terminal 101 can have an image capturing function and an image processing function, and can process a captured image and execute the corresponding function according to the processing result. The terminal 101 can independently complete the work and can also provide data services for the terminal through the image processing platform 102. The embodiments of the present application do not limit this.

The image processing platform 102 includes at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. The image processing platform 102 is used to provide background services for applications that support image processing. Optionally, the image processing platform 102 undertakes primary processing, and the terminal 101 undertakes secondary processing; or, the image processing platform 102 undertakes the secondary processing work, and the terminal 101 undertakes the primary processing work; alternatively, the image processing platform 102 or the terminal 101 can be separately provided with processing work. Alternatively, the image processing platform 102 and the terminal 101 perform cooperative computing by using a distributed computing architecture.

Optionally, the image processing platform 102 includes at least one server 1021 and a database 1022, where the database 1022 is used to store data, and in this embodiment, the database 1022 can store sample images or sample face images to provide data services for the at least one server 1021.

The server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, big data and artificial intelligence platform. The terminal can be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

Those skilled in the art will appreciate that the number of the terminals 101 and the servers 1021 can be greater or smaller. For example, the number of the terminals 101 and the servers 1021 may be only one, or the number of the terminals 101 and the servers 1021 may be several tens or several hundreds, or more, and the number of the terminals or the servers and the device types are not limited in the embodiment of the present application.

In a scenario where a server provides an image processing service, the embodiment of the present application can apply a cloud computing service in a cloud technology, and a plurality of servers process images in parallel or in batch. In an optional manner of the embodiment of the application, image processing is performed through an image processing model, the image processing model can be obtained by training based on a large number of sample face images, the sample face images can be stored in a database, and when the image processing model needs to be trained, the sample face images are extracted from the database. Of course, after image processing by the image processing model, the result of the processing and the image may be transmitted to the database and stored.

For Cloud technology, Cloud technology (Cloud technology) refers to a hosting technology for unifying a series of resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. Cloud technology (Cloud technology) is based on a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied in a Cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.

Optionally, the present application relates specifically to artificial intelligence cloud services, also commonly referred to as AIaaS (AI as service, chinese). The method is a service mode of an artificial intelligence platform, and particularly, the AIaaS platform splits several types of common AI services and provides independent or packaged services at a cloud. This service model is similar to the one opened in an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through an Application Programming Interface (API), and some of the sophisticated developers can also use the AI framework and the AI infrastructure provided by the platform to deploy and operate and maintain the self-dedicated cloud artificial intelligence services.

Fig. 2 is a flowchart of an image processing method provided in an embodiment of the present application, where the method is applied to an electronic device, and the electronic device is a terminal or a server, and referring to fig. 2, taking the application of the method to a terminal as an example, the method includes the following steps.

201. And the terminal carries out face detection on the acquired image.

The face detection means that for any one image, a certain strategy is adopted to search the image to determine whether the image contains a face, if so, the position, size and posture of the face are returned, namely, the information of a face area is also returned, and the face area can be marked through a face detection frame. The face region is a region of a face in an image.

The acquired image can be acquired by the terminal, that is, the terminal is an image acquisition device. Specifically, the terminal can periodically collect images, and when the images are collected, the terminal can detect faces, determine whether the images include faces or not, perform subsequent analysis or not, and determine whether the persons are close to the terminal and whether the target function needs to be executed or not.

Optionally, the acquired image may also be acquired by other devices and sent to the terminal, which is not limited in this application embodiment. The terminal provides an image processing service.

In this embodiment, the method is exemplified to be applied to a terminal, alternatively, the image processing step can also be executed by a server, and the terminal sends the acquired image to the server, and the server performs image processing and feeds back the image processing result.

202. When the terminal detects that the image comprises the face, the terminal acquires distance indicating information corresponding to the image according to the proportion of the face area in the image, wherein the distance indicating information is used for indicating the distance between the face and the image acquisition equipment, and the distance is a discrete distance.

Discrete distance refers to the distance of the discrete representation calculated by the device when the target is continuously moving away from the device. The discrete distance is in comparison to a continuous distance, which refers to a distance that is computed by the device as a continuous representation of the distance that the target is from the device as it moves continuously. For example, the discrete distance can be on a scale of distances, e.g., near, far, very far, etc. For example, the continuous distance is 45 centimeters.

203. And the terminal controls the image acquisition equipment to execute a target function according to the distance indication information.

In this embodiment, if the terminal is the image capturing device, the distance between the face in the image and the image capturing device is analyzed through the above steps, and the distance indicating information is an indication of the degree of distance and is not a numerical value of the distance. The terminal can execute corresponding functions according to setting when the distance between the human face and the image acquisition equipment meets the requirements.

In one possible implementation manner, when the distance between the human face and the image acquisition device is short, the terminal may control the image acquisition device to execute a corresponding function. Specifically, the distance indication information may include first distance indication information, where the first distance indication information is used to indicate that the distance between the human face and the image acquisition device is less than a distance threshold value, that is, the first distance indication information is used to indicate that the human face is closer to the image acquisition device.

Alternatively, the distance threshold may be set by the skilled person as required. Alternatively, the distance threshold may not be set, and the distance indicated by the first distance indication information is only smaller than the distance threshold.

When the distance indication information is the first distance indication information, the terminal can execute a corresponding function, for example, displaying a prompt message to remind the user of the proximity. For another example, wake up the screen, display the corresponding functional interface, etc. If the terminal is not the image acquisition device, the terminal can send the distance indication information to the image acquisition device or send a control instruction to the image acquisition device to control the image acquisition device to execute a target function.

Fig. 3 is a flowchart of an image processing method provided in an embodiment of the present application, and referring to fig. 3, the method includes the following steps.

301. And the terminal acquires a sample face image, wherein the sample face image carries corresponding target distance indication information.

The terminal can obtain a sample face image, and an image processing model is trained based on the sample face image. The trained image processing model can process the input image and analyze the distance between the human face and the image acquisition equipment in the image. The sample face image is an image containing a face, and whether the image contains the face can be determined by a face detection mode, that is, the sample face image is an image containing the face determined by the face detection.

Specifically, the sample face image can be obtained in various ways, and the embodiment of the application can obtain the sample face image in any way. The terminal can acquire a sample image, determine the sample image including the face in the sample image and the screen occupation ratio of a face detection frame in the sample image including the face through face detection, and cut the sample image according to the screen occupation ratio to obtain a sample face image required by training. In the second mode, the terminal can directly obtain sample face images, the sample face images are subjected to face recognition, the sample face images are divided according to the screen occupation ratio of the obtained face detection frames, and target distance indication information carried by each sample face image is determined.

The point that the second mode is different from the first mode is that: in the first mode, the terminal acquires a sample image, the sample image is not directly used, but the sample image determined to include a human face and obtained through clipping is used as a sample human face image required by training through a human face detection and clipping step. In the second mode, the sample face image is directly acquired, and only the sample face image needs to be classified.

The two modes will be described in detail below.

In the first mode, the terminal performs face detection on at least two sample images to obtain face detection frames of the at least two sample images; acquiring the proportion of the face detection frames of the at least two sample images in the at least two sample images; determining a cutting frame corresponding to the at least two sample images according to the proportion; and according to the size relation between the face detection frame and the cutting frame in the sample image, determining target distance indication information corresponding to the sample face image.

In the first mode, the terminal can analyze the sample image, screen the sample image which occupies a certain screen ratio and meets a certain condition in the sample image containing the human face, and cut the sample image on the basis of the screened sample image to obtain the sample human face image. And the samples are obtained by cutting, so that the distribution of the face images of the samples with various screen occupation ratios can be well controlled. The screen occupation ratio refers to the ratio between the area of the face region and the area of the sample image. Alternatively, the area of the face region can be referred to as the area of the face detection box. Alternatively, the area of the face region can be referred to as the area of the largest circular region in the face detection box.

Specifically, the terminal can screen out an image with a larger screen occupation and an image with a smaller screen occupation from the sample images, the image with the larger screen occupation is used as a cutting data basis of the sample face image with a short distance, and the image with the smaller screen occupation is used as a cutting data basis of the sample face image with a long distance.

Therefore, the sample image with the face closer to the image acquisition equipment in the sample image is cut to obtain a close sample face image, and a sufficiently clear sample face image can be obtained. When the sample face images with small screen occupation ratios are used for cutting, sample face images with various screen occupation ratios can be obtained through cutting, and the sample face images with the screen occupation ratios as much as possible can be obtained under the condition that the distribution difference of the sample face images with the screen occupation ratios is not much. The obtained sample face images are comprehensive in type and controllable in quantity. Alternatively, the sample face images at far distances can also be classified more diversely, e.g., far, not far, etc. Or directly numerically identify the levels, with

levels

1, 2, 3, 4, etc.

In one possible implementation, when different distance indication information is desired, the process of determining the crop box through the face detection box is different. Understandably, for a close-distance cropping frame, the cropping frame is smaller than the face detection frame, so as to crop a face image in a very close distance. For a long-distance cutting frame, the cutting frame needs to be larger than the human face detection frame, so that the distance between the cut image human face and the image acquisition equipment is long, and long-distance indication information can be given to the cut image human face and the image acquisition equipment. That is, the short distance indication information is referred to as first distance indication information. The long-distance indication information is distance indication information except the first distance indication information.

For close distances, the determination of the crop box can be: and the terminal responds that the proportion is larger than a first proportion threshold value, determines the sizes of the cutting frames corresponding to the at least two sample images according to the sizes of the face detection frames in the at least two sample images, and determines the positions of the cutting frames corresponding to the at least two sample images according to the positions of the face detection frames and the target offset, wherein the cutting frames are smaller than the face detection frames.

The first proportional threshold can be set by a person skilled in the art according to requirements, for example, 0.6, which is not limited in the embodiments of the present application.

Optionally, the terminal determines the width of the face detection frame as the width of the crop box, and determines the product of the height of the face detection frame and the target coefficient as the height of the crop box. The target coefficient can be set by a person skilled in the art according to requirements, for example, the target coefficient is 0.8, which is not limited in the embodiments of the present application.

Optionally, the terminal can also determine the product of the width of the face detection frame and the target coefficient as the width of the crop box, and determine the height of the face detection frame as the height of the crop box. Or, the terminal can also determine the product of the width of the face detection frame and the first target coefficient as the width of the cutting frame, and determine the product of the height of the face detection frame and the second target coefficient as the height of the cutting frame. The first target coefficient and the second target coefficient can be set by a person skilled in the relevant art according to requirements, and the embodiment of the present application is not limited thereto.

After the size of the cutting frame is determined, the specific position of the cutting frame can be determined. In one possible implementation, the position of the crop box can be identified by the position of its vertices. For example, the position of a crop box can be represented by its top left corner vertex position and bottom right corner vertex position. Of course, the identification can be performed in other ways, and the embodiment of the present application is not limited thereto.

Specifically, for a sample image, the terminal determines the vertex position of at least one crop box corresponding to the at least two sample images according to the center point position of the face detection box and the target offset in at least one offset direction.

For example, in a specific example, the cropping mode of the sample face image at the close distance is as shown in fig. 4, fig. 4 shows an effect graph processed by the close distance conversion module, and specifically shows a schematic diagram representing nine relative position relationships between the target frame (i.e., the face region) and the cropping frame, that is, the image includes 9 kinds of offset directions. Wherein u0, u1 and u2 indicate that the target box is positioned at the upper left, right upper and right sides of the cutting box; m0, m1 and m2 represent that the target box is positioned in the left middle direction, the middle right middle direction and the right middle direction of the cutting box; u0, u1, u2 indicate that the target box is located at the lower left, right, and bottom right of the crop box.

Alternatively, the determining step of the crop box can be performed by one or more close-range scaling modules, with different close-range scaling modules performing different calculations.

For example, as shown in fig. 5, fig. 5 shows a close-range scaling module 0 and a close-range scaling module 1, wherein the close-range scaling module 0 is used for processing four cases of u0, u2, b0 and b 2. The proximity module 1 is used to handle both m0 and m2 cases. For the three cases of u1, m1, and b1, on the basis of u0, m0, and b0, the crop box is translated to the left by a certain distance so that the crop box can coincide with the face detection box in the width direction, that is, the offset of the abscissa in the two cases is obtained, and the abscissa of u0, m0, and b0 is offset, which is not described herein again.

Assuming that the circular area represents an object (i.e., a face area), such as a human head portrait, the point O represents a center point of the circle (i.e., a center point position of the face detection frame). The dotted line rectangle represents the face detection frame just surrounding the portrait, the corresponding width is width, the corresponding height is height, the solid line rectangle represents the cropping frame, the upper left corner of the cropping frame is p0, and the lower right corner of the cropping frame is p 1.

The following describes the process of processing u0, u2, b0, and b2 by the short-range scaling module 0, and here, the scaling process of u0 is taken as an example, and it is assumed that p0 is located on the boundary of a circle so that the corresponding β angle is 45 degrees. Through the first three steps, the position of the cutting frame can be determined, after the position of the cutting frame is obtained, the fourth step is executed, a label is endowed to the fourth step, the label is the distance indication information, and after subsequent cutting, the image also carries the label.

1) And solving the position O of the central point of the face detection frame relative to the origin of the upper left corner of the video frame (anchor _ x, anchor _ y).

2) From the geometric relationship of the circles, the distance D between the point p0 and the point o_p0-oWhen the angle of β is 45 degrees, the abscissa offset of point p0 from point o is m 0.5 width, and m 2/2 is 0.7, then the position p0 of the top left corner of the crop box can be found (x0_ u0, y0_ u0) (anchor _ x-m 0.5 width, anchor _ y-m 0.5 height).

3) Let the width of the crop box be extension _ w-width and the height be extension _ h-0.8 height, and solve the position p1 of the lower right corner of the crop box (x1_ u0, y1_ u0) — (x0_ u0+ extension _ w, y0_ u0 extension _ h).

4) The corresponding tag is set as a short-range tag and is denoted by 0.

The following describes the process of processing m0 and m2 by the short-distance conversion module 1, and the following describes the process of processing m0 as an example, and obtains the coordinates of a crop box including a human face and a corresponding label.

1) From the value of m obtained by the short-distance scaling module 0, the position p0 of the top left corner of the crop box can be determined (x0_ m0, y0_ m0) (anchor _ x-m 0.5 width, anchor _ y-1 0.5 height).

2) Let the width of the cropping frame be expand _ w ═ width, and the height be expand ═ 0.8 × height.

The position p1 of the top left corner of the crop box is solved to (x1_ m0, y1_ m0) to (x0_ m0+ expand _ w, y0_ m0+ expand _ h).

3) The corresponding tag is set as a short-range tag and is denoted by 0.

For long distance, the terminal responds to the fact that the proportion is smaller than a second proportion threshold value, determines the size of a cutting frame corresponding to the at least two sample images according to the size of a human face detection frame in the at least two sample images and a target scaling coefficient, randomly determines the position of the cutting frame in the at least two sample images according to the size of the cutting frame in the at least two sample images, and the cutting frame is larger than the human face detection frame.

The second ratio threshold can be set by a person skilled in the art as required, for example, the second ratio threshold is 0.2, which is not limited in the embodiments of the present application.

For the target scaling factor, the target scaling factor can be set by related technical personnel according to requirements, the number of the target scaling factors can be one or more, and when the target scaling factors are different, the screen occupation ratios of the faces in the cut sample face images are different. For example, the target scaling factor includes a width scaling factor α _ w and a height scaling factor α _ h, and assuming that the combined scaling factor (α _ h, α _ w) is (1.25, 2), the ratio of the clipped face detection frame to the clipped frame is 1/(1.25 × 2) to 0.4. Thus, a target zoom factor can define a crop box corresponding to the screen ratio.

In one possible implementation, the target scaling factor includes a target width scaling factor and a target height scaling factor. And the terminal acquires the product of the size of the face detection frame and the target scaling coefficient, and the product is used as the size of the cutting frame. For example, the terminal obtains a first product of the width of the face detection frame and the target width scaling factor, takes the first product as the width of the cropping frame, obtains a second product of the height of the face detection frame and the target height scaling factor, and takes the second product as the height of the cropping frame.

Optionally, for a target scaling factor, the position of at least one crop box can be randomly determined in the at least two sample images. The number of the cutting frames is one or more. By randomly determining the plurality of cutting frames, a plurality of different sample face images with the same screen occupation ratio can be obtained, and the diversity of the sample face images is improved.

Specifically, the terminal can randomly determine the position of the top left corner vertex of the crop box from the edge region of the top left corner in the sample image according to the size of the crop box, and can determine the position of the bottom right corner vertex of the crop box according to the position of the top left corner vertex and the size of the crop box.

Wherein the size of the upper left corner edge region is inversely related to the size of the crop box. For example, the sum of the width of the top left corner edge region and the width of the crop box is the width of the sample image, and the sum of the height of the top left corner edge region and the height of the crop box is the height of the sample image.

For example, in a specific example, fig. 6 (a) shows a schematic diagram of a distant cropping frame, and fig. 6 illustrates the inclusion relationship between an original video frame (i.e., a frame of a sample image) (H-W), a distant cropping frame (H0-W0), and a face-detection frame (height-width), i.e., the cropping frame should contain all the face-detection frames and should also be contained by the original video frame.

Fig. 6 (b) shows a schematic diagram of the solution of the top left corner of the distant crop box. Specifically, assuming that the width W and the height H of the crop box are, for the top left corner of the crop box, in the rectangle abcd in (b) in fig. 6, the sum of the width W-W0 of the rectangle abcd and the width W0 of the crop box is the width W of the original video box, and the sum of the height H-H0 of the rectangle abcd and the height H0 of the crop box is the width H of the original video box. 1 or a plurality of points are selected as the upper left corner points of the cutting frame in a random mode, if one point is selected as the upper left corner point, the cutting frame corresponds to one point, if a plurality of points are selected as the upper left corner points, the cutting frame corresponds to a plurality of cutting frames, namely, the number of the corresponding cutting frames is the same as that of the upper left corner points, and the purpose is to ensure the diversity of long-distance samples generated by the same long-distance grade (namely, distance indication information).

Fig. 6 (c) shows a flow chart of solving the width and height of the remote to-be-clipped box. Therefore, the distance level (that is, the distance indicating information) of the long-distance sample to be generated can be determined, where 1 represents the minimum level in the long-distance sample, and the larger the number, the higher the level, and the longer the distance from the target to the lens represented by the obtained long-distance sample. After the remote level is determined, the target scaling coefficient can be determined, and the position of the upper left corner is selected and the position of the lower right corner is determined in the subsequent position solving step of the cutting box.

For a distant level of 1, the corresponding target scaling factor (α _ h, α _ w) — (1.25, 2), and the width and height of the corresponding uncorrected crop box are (h0, w0) — (1.25 height, 2 width), where height and width are the height and width of the face detection box, respectively, so that the ratio of the clipped face detection box to the screen of the crop box is 1/(1.25 × 2) — 0.4, and the label corresponding to the crop box is a distant label (distance indication information) and is denoted by 1.

When the distance level is 2, if the screen ratio after clipping is desired to be 0.3, the target scaling factor n should be multiplied by √ (4/3), the corresponding target scaling factor is (1.25, 2) √ (4/3), the width and height of the corresponding uncorrected clipping box is (1.25 √ (4/3) × height, 2 √ (4/3) × width), and the label corresponding to the clipping box is a distance label, which is denoted by 2. The rest distance grades can be analogized in turn, and the description is omitted here.

When generating the tags of a plurality of distance levels, the screen occupation ratio represented by the basic level having the distance level of 1 can be determined according to the actual application scene, the screen occupation ratio of the next level can be determined according to the actual requirement to represent the distance of a different target distance lens, and for example, the corresponding screen occupation ratio can be set to 0.3 for the distance level of 2. As the distance level increases, the area of the corresponding unmodified crop box becomes larger and larger, and if the area is larger than the range of the original video frame, the generation of data and labels with larger distance levels is stopped.

In view of this, it is possible to set a stop condition to determine whether the width and height of the uncorrected crop box satisfy H0< H & & W0< W, and determine whether the cropping needs to be stopped according to the determination result. Wherein, and & is "and" means that two conditions before and after are satisfied at the same time, that is, H0< H and W0< W. Specifically, if the width and height of the uncorrected crop box do not satisfy the condition "H0 < H & & W0< W", the set of crop boxes is discarded, otherwise, the width and height (H, W) ═ of the final crop box is (H0, W0).

For the above-mentioned sample face image obtaining process, the sample face image obtaining process is shown in fig. 7, and for the above-mentioned sample face image obtaining process, the sample face image obtaining process is shown in fig. 7, taking continuously collected images as video frames as an example, the terminal can input the video frames 701 including the face into the face detection module 702, perform face detection on the video frames 701 including the face by using the face detection module 702 to obtain a face detection frame 703, and then calculate the position and size of a crop box (i.e., the above-mentioned crop box) surrounding the face detection frame by using the discrete distance conversion module. Specifically, the step of calculating the crop box can be: if a condition 704 that the screen occupation ratio (i.e. the ratio of the face region in the image, i.e. the area of the face detection frame/the area of the frame of the video frame) is greater than a first ratio threshold (e.g. thr ═ 0.6) is satisfied, the short-distance scaling module 705 is fed, otherwise, a condition 706 that the screen occupation ratio is less than a second ratio threshold (e.g. thr _ 0.2) is satisfied, the long-distance scaling module 707 is fed. A series of crop boxes and corresponding labels 708 surrounding the face can be obtained by discrete distance scaling modules (short distance scaling module 705 and long distance scaling module 707). Then, the image cropping module 709 crops each video frame according to each cropping frame to obtain a cropped video frame and a label (i.e., target distance indication information) 710 of a corresponding discrete distance. Through the process, the discrete distance data is manufactured, and the discrete distance data is the sample face image carrying the target distance indication information.

In the second mode, the terminal performs face detection on at least two sample face images to obtain face detection frames of the at least two sample face images; acquiring the proportion of the face detection frames of the at least two sample face images in the at least two sample face images; and acquiring distance indication information corresponding to the proportion as target distance indication information corresponding to the at least two sample face images according to the corresponding relation between the proportion and the distance indication information. The distance label (target distance indication information) corresponding to each sample face image can be determined by directly setting the corresponding relation between the screen occupation ratio and the distance indication information.

In the second mode, the correspondence between the ratio and the distance indication information, that is, the correspondence between the screen occupation ratio and the distance indication information can be set. The terminal can directly obtain the sample face image, the discrete distance between the face in each image and the image acquisition equipment is determined for the sample face image according to the screen occupation ratio, and distance indication information, namely the distance label, is added for the discrete distance.

For example, a plurality of sections can be divided for the ratio, each section corresponds to one distance indication information (e.g., a distance label), and after the face detection frame that acquires any sample face image occupies the ratio of at least two sample face images, the distance indication information corresponding to the section to which the ratio belongs is determined as the distance indication information corresponding to the sample face image.

302. And the terminal inputs the sample face image into the image processing model, the image processing model classifies the distance between the face in the sample face image and the image acquisition equipment, and the predicted distance indication information corresponding to the sample face image is output.

The image processing model can be an initial model, the model parameters are initial values, the terminal can train the image based on the sample human face image, the model parameters are adjusted, and the feature extraction capability and the classification capability of the image processing model are improved. Specifically, the image processing model can perform convolution processing on the sample face image, extract image features, classify the image features, and determine prediction distance indication information.

Optionally, for a sample face image, the image processing model is classified to obtain feature vectors, the feature vectors correspond to a label space, and the label space refers to a space formed by a plurality of distance labels, for example, the label space includes two distance labels: near distance (indicated by 0) and far distance (indicated by 1). As another example, the label space includes five distance labels: a short distance (indicated by 0), a long distance (indicated by 1), a long distance (indicated by 2), a very long distance (indicated by 3), and a very long distance (indicated by 4). Each bit element in the feature vector corresponds to a distance indication information (e.g.,

distance levels

0, 1, 2, 3, 4, 5). The element is the probability that the distance between the face and the image acquisition device in the sample face image is at this distance level. The terminal obtains distance indication information corresponding to the maximum probability in the feature vector, namely the distance label. For example, the feature vector obtained by classifying a sample face image by the image processing model is [0.09, 0.11, 0.05, 0.05, 0.9], and the terminal can obtain the distance indication information "4" corresponding to the fifth bit element of the feature vector, that is, the terminal is in a very long distance.

For example, taking the distance indication information as the distance label as an example, after obtaining the short distance label represented by 0 and the long distance labels with different long distance levels represented by numbers larger than 0, the discrete distance can be modeled by a multi-classification algorithm, so that the image processing model can determine which level the corresponding input image belongs to. If the multi-classification algorithm predicts that the class of the input image is 1, the human face of the image is far away from the lens, but belongs to a relatively nearest position in the far distance, and the greater the number of the predicted class label is, the farther the human face is from the lens is.

The characterization of discrete distances is realized through the class labels of the multi-classification algorithm. The modeling method is suitable for scenes in which the actual distance does not need to be very accurate, for example, an intelligent robot at the doorway of a restaurant, when judging whether a customer arrives at the doorway through a face detection algorithm, the accurate distance between the customer and the robot is not required to be known, only how close or far the customer is to the robot is required to be known, and the fuzzy distance limit of how close or far the customer is modeled by using discrete distance, so that the aim of reducing the calculation complexity is fulfilled. In addition, the modeling mode is more suitable for the judgment of the distance by human vision, and when a person judges the object distance, the specific precision is difficult to estimate, but the approximate distance between the object and the person is judged by feeling.

In one possible implementation, as shown in fig. 8, the image processing model can include a neural network and a classifier, the input (input)801 of the image processing model is a video frame or an image frame, etc., and the backbone network (backbone)802 of the image processing model can be a neural network, for example, a backbone neural network of computer vision Group (VGG), AlexNet, etc., wherein AlexNet is a neural network designed by the champion acquirer Hinton of the ImageNet competition of 2012 and Alex krizhevy of his student. ImageNet identifies the project name for the computer vision system. The classifier can execute a multi-classification algorithm (classifier)803, and the classifier can be implemented by a neural network or a Machine learning method, such as a Support Vector Machine (SVM).

303. And the terminal acquires the prediction accuracy based on the prediction distance indication information and the target distance indication information.

In the training process, whether the model parameters are proper or not can be measured through prediction accuracy, and if not, the model parameters can be adjusted until the model parameters can accurately classify the model. The prediction accuracy is a loss value or other values, for example, the prediction accuracy is a value of a target loss function, and for example, the prediction accuracy is a reward value, and the like, which is not limited in the embodiment of the present application.

304. And the terminal adjusts the model parameters of the image processing model based on the prediction accuracy until the model parameters meet the target conditions.

The target condition can be set by a person skilled in the art as required, for example, the target condition is that the prediction accuracy is less than a threshold value, or the prediction accuracy converges, or the iteration number reaches a target number, which is not limited in the embodiment of the present application.

The steps 302 to 304 are an iterative process, after the model parameters are adjusted, the terminal may re-execute the

steps

302 and 303 based on the adjusted model parameters, adjust the model parameters based on the prediction accuracy of the next iterative process, and continuously adjust the model parameters through multiple iterations, so that the classification capability of the image processing model is improved, after the training is stopped, the trained image processing model is obtained, at this time, the accuracy of the classification of the image processing model is better, and if there is an image processing requirement in the following, the image can be input into the image processing model for accurate classification.

The steps 301 to 304 are a training process of the image processing model, and in the steps 301 to 304, taking the step of the terminal performing model training as an example, the terminal can acquire a sample face image, train the image processing model, and after acquiring images, process the acquired images based on the trained image processing model to determine the distance between the person in the image and the terminal.

In another possible implementation, the model training process is performed on a server. In this implementation manner, the terminal executes steps 301 to 304, obtains a sample face image, and trains an image processing model based on the sample face image to obtain a trained image processing model.

The model training may include the following two cases.

In the first situation, the terminal can obtain the trained image processing model from the server and store the image processing model in the local terminal, so that after the terminal acquires the image, the image processing model can be called from the local terminal to process the image, and the embedded image processing function is realized. When the terminal calls the local image processing model to perform image processing, the terminal may perform networking execution or perform offline execution, which is not limited in the embodiment of the present application.

In the second case, after the server trains and obtains the image processing model, background service of image processing can be provided for the terminal. The terminal can send the acquired image to the server, the server calls the trained image processing model to process the image sent by the terminal, and the processing result is fed back to the terminal, so that the terminal can realize the image processing function by calling the image processing service of the server.

In summary, the execution subjects of the model training process and the image processing process may include various situations, and the embodiment of the present application does not limit which situation is specifically adopted.

305. And the terminal collects images.

The terminal has an image acquisition function and an image processing function, can acquire images, process the acquired images, analyze the distance between the face and the terminal in the images, and can execute a target function when the distance is short.

Optionally, the terminal is equipped with a camera, and the terminal can acquire images based on the camera. The camera can be a front camera or a rear camera, optionally, the camera can also be a depth camera, an infrared camera and the like, and the type of the camera is not limited in the embodiment of the application.

In this embodiment, an example that the terminal acquires an image, that is, the terminal is the image acquisition device is taken as an example for description, optionally, if the terminal is not the image acquisition device, the terminal can receive the image sent by the image acquisition device.

306. The terminal inputs the image into an image processing model, the image processing model carries out face detection on the image, when the image comprises a face, the distance between the face in the image and the image acquisition equipment is classified according to the proportion of a face detection frame obtained by the face detection in the image, and distance indication information corresponding to the image is output.

The image processing model is trained to learn distance indication information corresponding to face images with various screen occupation ratios, when image processing requirements exist, the image processing can be executed by inputting the images into the image processing model, and the distance indication information corresponding to the images is output. The process of processing the image in step 306 is the same as that in step 302, and is not described herein again.

The step 306 is a process of classifying the distance between the face in the image and the image acquisition device according to the proportion of the face detection frame in the image to obtain distance indication information corresponding to the image, and the process can also be realized by calling a feature extraction algorithm and a classification algorithm without adopting an image processing model, and the embodiment of the application does not limit the specific mode.

307. And the terminal controls the image acquisition equipment to execute a target function according to the distance indication information.

In this embodiment, taking the terminal to acquire an image, that is, the terminal is the image acquisition device for example to describe, optionally, if the terminal is not the image acquisition device, the terminal may send the distance indication information to the image acquisition device or send a control instruction to the image acquisition device to control the image acquisition device to execute a target function.

The target function is set by a relevant technician as required, and the target function is not specifically limited in the embodiment of the present application. Examples of several target functions are provided below.

In an example I, the terminal controls the screen of the image acquisition device to light up and displays a target interface in response to the fact that the distance indication information is first distance indication information.

The terminal is usually in a screen-off state when not used in order to save electric quantity consumption, the screen is waken and an interface with corresponding functions is displayed for a user through automatically detecting that the distance between a person and image acquisition equipment is short, manual wakening of the user is not needed, the user does not need to manually operate the equipment to trigger the equipment to display the target interface, convenience is provided for the user, user operation is reduced, and the operation efficiency of the user is improved.

The target interface is set according to actual requirements, for example, in a payment scenario, the target interface is a payment interface. That is, the image capture device detects that the person is near, can wake up the screen, and display the payment interface. For another example, in an express application scenario, the target interface is an express pickup interface or an express delivery interface. That is, the image acquisition equipment detects that the person is close to oneself, can awaken the screen up to show and get express delivery interface or post express delivery interface etc..

And example two, in response to that the distance indication information is first distance indication information, the terminal displays prompt information, and the prompt information is used for prompting that the distance from the image acquisition device to the current position is too small.

The prompt information is displayed in a popup window mode or in a skip interface mode. For example, the terminal detects that the user is too close to the screen, and displays a prompt message to prompt the user to keep a distance with the screen to protect eyes when the user is too close to the screen.

Alternatively, an eye protection mode may be provided, and the function may be performed when the terminal is turned on in the eye protection mode, and may not be performed in some specific modes, for example, the specific mode may be a game mode. Alternatively, the mode may not be distinguished, and the mode may be executed when the terminal detects that the distance between the face and the terminal is small. The embodiments of the present application do not limit this.

In one possible implementation manner, the distance indication information includes first distance indication information and distance indication information, the first distance indication information is used for indicating that the distance between the face and the image acquisition device is smaller than a distance threshold, and the second distance indication information is used for indicating that the distance between the face and the image acquisition device is larger than the distance threshold;

accordingly, in step 307, the terminal controls the image capturing device to execute the target function in response to that the distance indicating information corresponding to the image is the first distance indicating information.

If the distance between the face and the image acquisition device is far, the image acquisition device can not execute the target function, namely, the terminal responds that the distance indication information corresponding to the image is the second distance indication information, ignores the image and does not execute the target function.

According to the discrete distance modeling method based on the multi-classification algorithm, discrete modeling is carried out on the continuous distance between the target and the camera through the multi-classification algorithm, so that the calculation complexity of the distance algorithm is reduced, a posture sensor is not needed, images are collected through the camera and processed, the function of 'being too close to a mobile phone screen and giving a prompt' can be enjoyed by middle and low-end equipment, or the functions of awakening and displaying a payment interface or an express interface and the like when a user approaches without user operation can be achieved. Of course, the method can also be applied to other scenarios, which are not listed here.

In one possible implementation manner, the other distance indication information can also correspond to other functions, and the terminal can execute the other functions corresponding to the other distance indication information. For example, in the manner in which the distance indication information includes the first distance indication information and the second distance indication information, the terminal may be capable of executing a function corresponding to the second distance indication information in response to the distance indication information being the second distance indication information. For example, it is possible to set automatic shutdown, automatic screen turn-off, automatic return to the home page, or the like when the distance is long.

All the above optional technical solutions can be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

Fig. 9 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, and referring to fig. 9, the apparatus includes:

a detection module 901, configured to perform face detection on the acquired image;

an obtaining module 902, configured to, when it is detected that the image includes a face, obtain distance indication information corresponding to the image according to a proportion of a face region in the image, where the distance indication information is used to indicate a distance between the face and an image acquisition device, and the distance is a discrete distance;

and a control module 903, configured to control the image capturing device to execute a target function according to the distance indication information.

In a possible implementation manner, the obtaining module 902 is configured to classify distances between faces in the image and the image acquisition device according to proportions of the face detection frames in the image, so as to obtain distance indication information corresponding to the image.

In a possible implementation manner, the detecting module 901 and the obtaining module 902 are configured to input the image into an image processing model, perform face detection on the image by the image processing model, classify a distance between a face in the image and an image acquisition device according to a proportion of a face detection frame obtained by the face detection in the image when the image is detected to include the face, and output distance indication information corresponding to the image.

inputting the sample face image into the image processing model, classifying the distance between the face in the sample face image and the image acquisition equipment by the image processing model, and outputting the indication information of the predicted distance corresponding to the sample face image;

acquiring prediction accuracy based on the predicted distance indicating information and the target distance indicating information;

In one possible implementation, the acquiring a sample face image includes:

and according to the size relation between the face detection frame and the cutting frame in the sample image, determining target distance indication information corresponding to the sample face image.

In one possible implementation, the determining the crop box corresponding to the at least two sample images according to the ratio includes:

in response to the proportion being larger than a first proportion threshold, determining the sizes of the cropping frames corresponding to the at least two sample images according to the sizes of the face detection frames in the at least two sample images, and determining the positions of the cropping frames corresponding to the at least two sample images according to the positions of the face detection frames and the target offset, wherein the cropping frames are smaller than the face detection frames;

and responding to the proportion smaller than a second proportion threshold value, determining the sizes of the cutting frames corresponding to the at least two sample images according to the sizes of the face detection frames in the at least two sample images and the target scaling coefficient, and randomly determining the positions of the cutting frames in the at least two sample images according to the sizes of the cutting frames in the at least two sample images, wherein the cutting frames are larger than the face detection frames.

In one possible implementation, the acquiring a sample face image includes:

In one possible implementation, the control module 903 is configured to perform any of the following:

in response to that the distance indication information is first distance indication information, controlling a screen of the image acquisition equipment to be lightened, and displaying a target interface, wherein the distance indicated by the first distance indication information is smaller than a distance threshold;

and in response to the fact that the distance indication information is first distance indication information, displaying prompt information, wherein the prompt information is used for prompting that the distance from the image acquisition equipment to the current position is too small, and the distance indicated by the first distance indication information is smaller than a distance threshold value. The device that this application embodiment provided, through handling the image of gathering, the proportion of face region in the image in the analysis image, confirm the discrete distance between face and the image acquisition equipment in the image, adopt this kind of discrete distance to show, need not the distance between accurate calculation face and the image acquisition equipment, estimate out the distance degree that the two is apart from can, the accurate calculation process to the distance has been saved like this, a large amount of calculated quantity has been saved, can also improve image processing efficiency, and through carrying out the estimation of distance to the image of gathering, can realize the target function, need not to be equipped with other subassemblies, can reduce the cost of manufacture of equipment.

It should be noted that: in the image processing apparatus provided in the above embodiment, when processing an image, only the division of the above functional modules is taken as an example, and in practical applications, the above function allocation can be completed by different functional modules according to needs, that is, the internal structure of the image processing apparatus is divided into different functional modules so as to complete all or part of the above described functions. In addition, the image processing apparatus and the image processing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.

The electronic device in the above method embodiment can be implemented as a terminal. For example, fig. 10 is a block diagram of a terminal according to an embodiment of the present disclosure. The terminal 1000 can be: the mobile phone comprises a smart phone, a tablet computer, an MP3(Moving picture Experts Group Audio Layer III, motion picture Experts compression standard Audio Layer 3) player, an MP4(Moving picture Experts Group Audio Layer IV, motion picture Experts compression standard Audio Layer 4) player, a notebook computer, a desktop computer, an intelligent robot or a self-service payment device. Terminal 1000 can also be referred to as user equipment, portable terminal, laptop terminal, desktop terminal, or the like by other names.

In general, terminal 1000 can include: one or more processors 1001 and one or more memories 1002.

The processor 1001 can include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 1001 can be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1001 can also include a main processor and a coprocessor, the main processor being a processor for processing data in the wake-up state, also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1001 can be integrated with a GPU (Graphics Processing Unit) that is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, the processor 1001 can further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

The memory 1002 can include one or more computer-readable storage media, which can be non-transitory. The memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 1002 is used to store at least one instruction for execution by the processor 1001 to implement the image processing methods provided by the method embodiments herein.

In some embodiments, terminal 1000 can also optionally include: a peripheral interface 1003 and at least one peripheral. The processor 1001, memory 1002 and peripheral interface 1003 may be connected by a bus or signal lines. Each peripheral can be connected to the peripheral interface 1003 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1004, display screen 1005, camera assembly 1006, audio circuitry 1007, positioning assembly 1008, and power supply 1009.

The peripheral interface 1003 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 1001 and the memory 1002. In some embodiments, processor 1001, memory 1002, and peripheral interface 1003 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1001, the memory 1002, and the peripheral interface 1003 can be implemented on separate chips or circuit boards, which are not limited by this embodiment.

The Radio Frequency circuit 1004 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 1004 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1004 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1004 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1004 is capable of communicating with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1004 can further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1005 is used to display a UI (User Interface). The UI can include graphics, text, icons, video, and any combination thereof. When the display screen 1005 is a touch display screen, the display screen 1005 also has the ability to capture touch signals on or over the surface of the display screen 1005. The touch signal can be input to the processor 1001 as a control signal to be processed. At this point, the display screen 1005 can also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display screen 1005 can be one, providing a front panel of terminal 1000; in other embodiments, display 1005 can be at least two, respectively disposed on different surfaces of terminal 1000 or in a folded design; in other embodiments, display 1005 can be a flexible display disposed on a curved surface or a folded surface of terminal 1000. Even more, the display screen 1005 can be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display screen 1005 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 1006 is used to capture images or video. Optionally, the camera assembly 1006 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1006 can also include a flash. The flash lamp can be a monochrome temperature flash lamp and can also be a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp and can be used for light compensation under different color temperatures.

The audio circuit 1007 can include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1001 for processing or inputting the electric signals to the radio frequency circuit 1004 for realizing voice communication. For stereo capture or noise reduction purposes, multiple microphones can be provided, each at a different location of terminal 1000. The microphone can also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1001 or the radio frequency circuit 1004 into sound waves. The loudspeaker can be a conventional membrane loudspeaker, but also a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to human, but also the electric signal can be converted into a sound wave inaudible to human for use in distance measurement or the like. In some embodiments, the audio circuit 1007 can also include a headphone jack.

A location component 1008 is employed to locate a current geographic location of terminal 1000 for navigation or LBS (location based Service). The positioning component 1008 can be a positioning component based on the GPS (global positioning System) of the united states, the beidou System of china, or the galileo System of russia.

Power supply 1009 is used to supply power to various components in terminal 1000. The power source 1009 can be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power source 1009 includes a rechargeable battery, the rechargeable battery can be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery can also be used to support fast charge technology.

In some embodiments, terminal 1000 can also include one or more sensors 1010. The one or more sensors 1010 include, but are not limited to: acceleration sensor 1011, gyro sensor 1012, pressure sensor 1013, fingerprint sensor 1014, optical sensor 1015, and proximity sensor 1016.

Acceleration sensor 1011 can detect acceleration magnitudes on three coordinate axes of a coordinate system established with terminal 1000. For example, the acceleration sensor 1011 can be used to detect components of the gravitational acceleration on three coordinate axes. The processor 1001 can control the display screen 1005 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1011. The acceleration sensor 1011 can also be used for acquisition of motion data of a game or a user.

The gyro sensor 1012 can detect the body direction and the rotation angle of the terminal 1000, and the gyro sensor 1012 and the acceleration sensor 1011 can cooperate to acquire the 3D motion of the user on the terminal 1000. From the data collected by the gyro sensor 1012, the processor 1001 can implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensor 1013 can be disposed on a side frame of terminal 1000 and/or underneath display screen 1005. When pressure sensor 1013 is disposed on a side frame of terminal 1000, a user's grip signal on terminal 1000 can be detected, and processor 1001 performs left-right hand recognition or shortcut operation according to the grip signal collected by pressure sensor 1013. When the pressure sensor 1013 is disposed at a lower layer of the display screen 1005, the processor 1001 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1005. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1014 is used to collect a fingerprint of the user, and the processor 1001 identifies the user according to the fingerprint collected by the fingerprint sensor 1014, or the fingerprint sensor 1014 identifies the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 1001 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying, and changing settings, etc. Fingerprint sensor 1014 can be disposed on the front, back, or side of terminal 1000. When a physical key or vendor Logo is provided on terminal 1000, fingerprint sensor 1014 can be integrated with the physical key or vendor Logo.

The optical sensor 1015 is used to collect the ambient light intensity. In one embodiment, the processor 1001 can control the display brightness of the display screen 1005 according to the ambient light intensity collected by the optical sensor 1015. Specifically, when the ambient light intensity is high, the display brightness of the display screen 1005 is increased; when the ambient light intensity is low, the display brightness of the display screen 1005 is turned down. In another embodiment, the processor 1001 can also dynamically adjust the shooting parameters of the camera assembly 1006 according to the intensity of the ambient light collected by the optical sensor 1015.

Proximity sensor 1016, also known as a distance sensor, is typically disposed on a front panel of terminal 1000. Proximity sensor 1016 is used to gather the distance between the user and the front face of terminal 1000. In one embodiment, when proximity sensor 1016 detects that the distance between the user and the front surface of terminal 1000 is gradually reduced, processor 1001 controls display screen 1005 to switch from a bright screen state to a dark screen state; when proximity sensor 1016 detects that the distance between the user and the front of terminal 1000 is gradually increased, display screen 1005 is controlled by processor 1001 to switch from a breath-screen state to a bright-screen state.

Those skilled in the art will appreciate that the configuration shown in FIG. 10 is not intended to be limiting and can include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

The electronic device in the above method embodiment can be implemented as a server. For example, fig. 11 is a schematic structural diagram of a server provided in this embodiment of the present application, where the server 1100 may generate relatively large differences due to different configurations or performances, and can include one or more processors (CPUs) 1101 and one or more memories 1102, where the memory 1102 stores at least one instruction, and the at least one instruction is loaded and executed by the processors 1101 to implement the image Processing methods provided by the above-mentioned method embodiments. Certainly, the server can also have components such as a wired or wireless network interface and an input/output interface to facilitate input and output, and the server can also include other components for implementing the functions of the device, which is not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory, including at least one instruction, the at least one instruction being executable by a processor to perform the image processing method in the above embodiments is also provided. For example, the computer-readable storage medium can be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product or a computer program is also provided, which comprises one or more program codes stored in a computer-readable storage medium. The one or more processors of the electronic device can read the one or more program codes from the computer-readable storage medium, and the one or more processors execute the one or more program codes, so that the electronic device can perform the above-described image processing method.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

It should be understood that determining B from a does not mean determining B from a alone, but can also determine B from a and/or other information.

Those skilled in the art will appreciate that all or part of the steps for implementing the above embodiments can be implemented by hardware, or can be implemented by a program for instructing relevant hardware, and the program can be stored in a computer readable storage medium, and the above mentioned storage medium can be read only memory, magnetic or optical disk, etc.

The above description is intended only to be an alternative embodiment of the present application, and not to limit the present application, and any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An image processing method, characterized in that the method comprises:

carrying out face detection on the acquired image;

2. The method according to claim 1, wherein the obtaining distance indication information corresponding to the image according to the proportion of the face region in the image comprises:

and classifying the distance between the face in the image and the image acquisition equipment according to the proportion of the face detection frame in the image to obtain distance indication information corresponding to the image.

3. The method of claim 1, wherein the face detection is performed on the acquired image; when the image is detected to include the face, acquiring distance indication information corresponding to the image according to the proportion of the face area in the image, wherein the distance indication information includes:

inputting the image into an image processing model, carrying out face detection on the image by the image processing model, classifying the distance between the face in the image and image acquisition equipment according to the proportion of a face detection frame obtained by face detection in the image when the image is detected to comprise the face, and outputting distance indication information corresponding to the image.

4. The method of claim 3, wherein the training process of the image processing model comprises:

5. The method of claim 4, wherein the obtaining a sample face image comprises:

6. The method of claim 5, wherein determining the crop box corresponding to the at least two sample images according to the ratio comprises:

7. The method according to claim 6, wherein the determining the position of the crop box corresponding to the at least two sample images according to the position of the face detection box and the target offset comprises:

8. The method according to claim 6, wherein the determining, according to a size relationship between a face detection frame and the crop frame in a sample image, target distance indication information corresponding to the sample face image comprises:

9. The method of claim 4, wherein the obtaining a sample face image comprises:

10. The method according to claim 1, wherein the controlling the image acquisition device to execute a target function according to the distance indication information comprises any one of:

11. An image processing apparatus, characterized in that the apparatus comprises:

12. The apparatus according to claim 11, wherein the obtaining module is configured to input the image into an image processing model, perform face detection on the image by the image processing model, classify a distance between a face in the image and an image capturing device according to a proportion of a face detection frame obtained by the face detection in the image when the image is detected to include the face, and output distance indication information corresponding to the image.

13. The apparatus of claim 12, wherein the training process of the image processing model comprises:

14. An electronic device, comprising one or more processors and one or more memories having at least one program code stored therein, the at least one program code being loaded and executed by the one or more processors to implement the image processing method of any one of claims 1 to 10.

15. A computer-readable storage medium, characterized in that at least one program code is stored in the storage medium, which is loaded and executed by a processor to implement the image processing method according to any one of claims 1 to 10.