WO2019136636A1

WO2019136636A1 - Image recognition method and system, electronic device, and computer program product

Info

Publication number: WO2019136636A1
Application number: PCT/CN2018/072111
Authority: WO
Inventors: 刘兆祥; 廉士国; 王敏
Original assignee: 深圳前海达闼云端智能科技有限公司
Priority date: 2018-01-10
Filing date: 2018-01-10
Publication date: 2019-07-18
Also published as: CN108235816A; CN108235816B

Abstract

Provided are an image recognition method and system, an electronic device, and a computer program product, wherein same are applied to the technical field of image recognition. The method comprises: collecting, by using the medium focal length, an image of a scene where an object to be recognized is located; determining, according to the image of the scene, the focal recognition length; and recognizing, by using the focal recognition length, the object to be recognized. The focal recognition length is dynamically determined based on a scene where an object to be recognized is located, and the focal recognition length is used to recognize the object to be recognized, thus realizing automatic recognition of an environment without intervention and input by a user, and a suitable focal length is selected based on the environment in order to achieve the best photography effect, thus improving the recognition accuracy and greatly improving the convenience of life for the blind.

Description

Image recognition method, system, electronic device and computer program product

Technical field

The present application relates to the field of image recognition technologies, and in particular, to an image recognition method, system, electronic device, and computer program product.

Background technique

China is the country with the most blind people in the world. As a special group of social groups, they live in the boundless darkness for life, so they often encounter various problems.

Camera-based intelligent image recognition can enhance the convenience of blind people's life, and the quality of the captured image is crucial for subsequent recognition functions. The fixed focus camera can only capture clear images within a certain depth of field, and the scope of application is limited; while the auto focus camera often does not focus when the user does not intervene, resulting in images that cannot be subsequently refined. Image Identification.

Summary of the invention

Embodiments of the present application provide an image recognition method, system, electronic device, and computer program product.

In a first aspect, an embodiment of the present application provides an image recognition method, where the method includes:

Collecting a scene image of the recognition object by using a focal length of the focal length;

Determining a focal length according to the scene image;

The identification object is identified by the recognition focal length.

In a second aspect, an embodiment of the present application provides an electronic device, where the electronic device includes:

A memory, one or more processors; a memory coupled to the processor via a communication bus; a processor configured to execute instructions in the memory; the storage medium storing instructions for performing the various steps of the method of the first aspect.

In a third aspect, an embodiment of the present application provides a computer program product for use in conjunction with an electronic device including a display, the computer program product comprising a computer readable storage medium and a computer program mechanism embedded therein, the computer The program mechanism includes instructions for performing the various steps in the method of the first aspect described above.

In a fourth aspect, an embodiment of the present application provides an image recognition system, where the image recognition system includes: an image acquisition unit and a mobile calculation processing unit;

The image acquisition unit is a camera with a controllable focus, wherein the focal length controllable range includes a telephoto focal length, a medium focal length, and a short focal length; or

The image acquisition unit is three or more fixed focus cameras, wherein at least one telephoto camera, at least one medium focus camera, and at least one short focus camera;

The mobile computing processing unit is the electronic device of the second aspect;

The mobile computing processing unit is coupled to the image acquisition unit via a universal serial bus USB or wireless communication.

The benefits are as follows:

In the embodiment of the present application, the scene image of the recognition object is collected by using the focal length of the focal point, the recognition focal length is determined according to the scene image, and the recognition object is identified by using the recognition focal length, thereby realizing the automatic environment without user intervention and input. Identification, based on the environment to choose the appropriate focal length to achieve the best shooting results, thereby improving recognition accuracy and greatly improving the convenience of blind life.

DRAWINGS

Specific embodiments of the present application will be described below with reference to the accompanying drawings, in which:

1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

2 is a schematic flow chart of an image recognition method in the embodiment of the present application;

FIG. 3 is a schematic diagram of an implementation method of an image recognition method in an embodiment of the present application.

Detailed ways

The exemplary embodiments of the present application are further described in detail below with reference to the accompanying drawings, in which the embodiments described are only a part of the embodiments of the present application, but not all embodiments. An exhaustive example. And in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.

In order to improve the quality of image capture, the convenience of blind life is improved. The embodiment of the present application provides an image recognition method, which adopts a focal length of a focal length to collect a scene image of a recognition object, determines a recognition focal length according to the scene image, and recognizes the recognition object by using the recognition focal length, thereby realizing no user intervention and input. In the case, the environment is automatically recognized, and the appropriate focal length is selected based on the environment to achieve the best shooting effect, thereby improving the recognition accuracy and greatly improving the convenience of the blind life.

The image recognition method provided by the present application can be used in an image recognition system as follows. The image recognition system includes an image acquisition unit and a mobile calculation processing unit.

1, image acquisition unit

The image acquisition unit is configured to acquire an image at the current focal length. However, the focal length of the image acquisition unit in the present application can be controlled and modified by the mobile computing processing unit, and the focal length can be adjusted in a wide range, for example, a few centimeters of text can be clearly captured to a tens of meters of traffic lights or traffic signs. Therefore, the specific implementation form of the image acquisition unit in the present application may be various.

For example, the image acquisition unit is a camera with a focal length controllable, wherein the focal length controllable range includes a telephoto focal length, a medium focal length, and a short focal length. In this case, the number of cameras is not limited, but at least one camera with a controlled focal length is at least one.

For another example, the image acquisition unit is three or more fixed focus cameras, wherein at least one telephoto camera, at least one medium focus camera, and at least one short focus camera.

In practical applications, the image acquisition unit may be located on the wearable glasses, such as on the guide glasses.

2, mobile computing processing unit

The mobile computing processing unit can be connected to the image acquisition unit via USB (Universal Serial Bus) or wireless communication (such as Bluetooth).

The mobile computing processing unit is responsible for controlling the focal length of the image acquisition unit, image acquisition, scene rough classification, specific image recognition, and voice broadcast output.

For example, the mobile computing processing unit may control the focal length of the image acquiring unit to be a focal length of the focal length, and acquire an image of the scene in which the identifying object is located by an image acquiring unit whose focal length is a focal length of the focal length.

For another example, the mobile technology processing unit may further control the focal length of the image acquisition unit to identify the focal length, and acquire the first image of the recognition object by the image acquisition unit that recognizes the focal length by the focal length.

For another example, the mobile technology processing unit may further control the focal length of the image acquisition unit to be a mid-focus focal length after determining that the recognition is completed.

The mobile computing processing unit may be an electronic device as shown in FIG. 1 in a specific application. The electronic device can be a general purpose smartphone. The electronic device includes: a memory 101, one or more processors 102; and a transceiver component 103. The memory, the processor, and the transceiver component 103 are communicated through a communication bus (in the embodiment of the present application, the communication bus is an I/O bus). Connected; the storage medium stores instructions for performing the steps in the image recognition method shown in FIG. 2, thereby realizing the scene image in which the recognition target is located by using the focal length of the focal point, determining the focal length according to the scene image, and using the recognition focal length The function of recognizing the recognition object realizes the automatic recognition of the environment without user intervention and input, and selects the appropriate focal length based on the environment to achieve the best shooting effect, thereby improving the recognition accuracy and greatly improving the life of the blind. Convenience.

It is not difficult to understand that, in the specific implementation, in order to achieve the basic purpose of the present application, the above-mentioned transceiver component 103 is not necessarily required.

Referring to FIG. 2, the image recognition method provided by this embodiment includes:

201. Collect a scene image of the recognition object by using a focal length of the focal length.

201-1. The mobile computing processing unit establishes a connection with the image acquiring unit by using a wireless communication method such as USB or Bluetooth.

201-2. The mobile computing processing unit collects the scene image of the recognition object by using the focal length of the focal length through the connection.

Specifically:

1) If the image acquisition unit is a camera with a controllable focus, then

(1) The mobile computing processing unit adjusts the focal length of the image acquiring unit at an intermediate distance by the connection, and the current focal length of the image acquiring unit is the focal length of the focal length.

(2) The image acquisition unit collects the scene image of the recognition object under the current focal length, and transmits the scene image to the mobile calculation processing unit, so that the mobile calculation processing unit collects the scene where the recognition object is located by the image acquisition unit with the focal length of the focal length of the focus. image.

2) if the image acquisition unit is more than three fixed focus cameras, wherein at least one telephoto camera, at least one medium focus camera, and at least one short focus camera,

(1) The mobile computing processing unit selects the mid-focus camera in the image acquisition unit through the connection.

(2) The selected mid-focus camera captures the scene image of the recognition object, and transmits the scene image to the mobile computing processing unit, so that the mobile computing processing unit collects the scene image of the recognition object by the image acquiring unit with the focal length of the focal length.

202. Determine a recognition focal length according to the scene image.

After the scene image is collected in step 201, the scene image is roughly classified to obtain a coarsely classified scene, that is, a coarse scene to which the scene image belongs. The recognition focal length at the time of specific recognition is determined according to the associated coarse scene, so that the lens is adjusted to the corresponding recognition focal length, an image is collected here, and then the corresponding image recognition function is called.

The specific implementation process of step 202 is as follows:

202-1: The scene image is coarsely classified by the scene rough classification model, and the coarse scene to which the scene image belongs is determined.

The coarse scene is a telephoto scene, a medium focus scene or a short focus scene. The coarse scene corresponds to one or more image recognition functions. The image recognition functions corresponding to different coarse scenes are different, and the number of corresponding image recognition functions may be the same or different.

For example, in the case where the coarse scene is a telephoto scene, there is only one image recognition function corresponding to the telephoto scene, that is, the traffic light recognition function.

For another example, for the case where the coarse scene is a short-focus scene, the image recognition function corresponding to the short-focus scene is two, that is, the reading recognition function and the item recognition function.

For another example, for the case where the coarse scene is in the mid-focus scene, the image recognition function corresponding to the medium-focus scene is multiple, and one of the image recognition functions is a face recognition function.

In addition to the telephoto scene, the medium focus scene, or the short focus scene, the coarse scene in this embodiment may be added according to actual conditions. The number and specific functions of the image recognition function corresponding to each coarse scene can also be adjusted according to actual conditions. This embodiment does not limit the specific categories included in the coarse scene, the specific number and specific functions of the image recognition function corresponding to the coarse scene, and the categories included in the coarse scene, the number of corresponding image recognition functions, and the adjustment time and adjustment form of the function.

In addition, the scene rough classification model is obtained by deep learning the samples of the telephoto scene, the samples of the medium focal scene, and the samples of the short focus scene.

specific,

1. Collecting image samples in three scenes based on the mid-focus lens. For example, when using a telephoto scene, the scene is generally identified by a traffic light, and such a scene image can be collected as a sample of the scene category; when the medium-focus scene is used, the face recognition scene is generally used. The scene image with the pedestrian distance and the front is collected as a sample of the scene category; when the short focal scene is used, the scene is generally recognized by OCR (Optical Character Recognition), and a scene image such as a book can be collected as the scene category. Sample.

2) Training based on CNN (Convolutional Neural Network), such as training using the resnet network.

After the training is completed, the trained scene rough classification model is obtained. In step 202-1, the trained scene coarse classification model and the weight can be used to classify the scene image, and according to the magnitude of the output probability, which coarse scene belongs to, based on The deep learning uses CNN to perform rough classification and recognition on the scene image collected in step 201.

202-2. Determine a focal length corresponding to the associated coarse scene as the recognition focal length.

At this point, the recognition focal length when the identification object is actually recognized is obtained, and the function of dynamically changing the focal length based on different coarse scenes is realized, and the image capturing quality at the subsequent recognition is improved, thereby improving the recognition accuracy and greatly improving the convenience of the blind life.

203. Identify the recognition object by using the recognition focal length.

203-1, the first image of the recognition object is collected by using the recognition focal length.

The mobile technology processing unit controls the focal length of the image acquisition unit to recognize the focal length, and the first image of the recognition object is acquired by the image acquisition unit that recognizes the focal length by the focal length.

Specifically:

1) If the image acquisition unit is a camera with a controllable focus, then

(1) The mobile computing processing unit adjusts the focal length of the image acquiring unit to recognize the focal length by the connection established with the image acquiring unit, and the current focal length of the image acquiring unit is the recognized focal length.

(2) The image acquisition unit collects the scene image of the recognition object under the current focal length, and transmits the scene image to the mobile computing processing unit, so that the mobile computing processing unit collects the first image of the recognition object by the image acquisition unit that focuses the focal length. .

(1) The mobile computing processing unit selects a camera corresponding to the recognized focal length in the image acquiring unit by the connection established with the image acquiring unit.

(2) The selected camera captures the scene image of the recognition object, and transmits the scene image to the mobile computing processing unit, so that the mobile computing processing unit collects the first image of the recognition object by the image acquiring unit that focuses the focal length.

203-2: If the corresponding rough scene corresponds to an image recognition function, the image recognition function corresponding to the coarse scene is called to identify the first image.

If there is only one image recognition function in the coarse scene, for example, the coarse scene is a telephoto scene, and the image recognition function corresponding to the telephoto scene has only one traffic light recognition function, then the recognition function such as red and green can be directly called.

The traffic light recognition function is called, and the identification object is determined to be a red light, a green light or a yellow light according to the first image; or the traffic light recognition function is called, and the identification object is determined to be a red light, a green light, a yellow light or a non-red green light according to the first image.

For example, in the traffic light recognition function, the first image is identified by the deep neural network to realize the discrimination of the red light, the green light, and the yellow light, or the first image is identified by the deep neural network to realize the red light, the green light, the yellow light, or the non-red light. The judgment of the green yellow light.

For the specific implementation process of this solution, refer to the implementation process of rough classification of the scene image in step 202-1.

For example, in the traffic light recognition function, three targets of red light, green light, and yellow light in the first image are detected by the target detection mode, or red light and green light in the first image are detected by the target detection mode. Four targets, yellow light or non-red, green and yellow light are detected.

The target detection mode includes, but is not limited to, detection based on an SSD (Single Shot MultiBox Detector) target detection model.

203-3, if the corresponding coarse scene corresponds to multiple image recognition functions, the first image is finely classified by the scene fine classification model, and the corresponding image recognition function of the first image in the associated coarse scene is determined; The corresponding image recognition function in the coarse scene identifies the first image.

If the coarse scene corresponds to multiple subdivided image recognition functions, the scene classification may be performed first, and then the specific image recognition function is called.

For example, if the coarse scene is a short-focus scene, the short-focus scene corresponds to two image recognition functions: a book recognition function and an item recognition function. At this time, you can first use the convolutional neural network, or through the CNN, to determine whether it is an OCR scene or an object recognition scene based on what is in hand. (For specific implementation, refer to step 202-1 for rough classification of scene images. Implementation process), if it is OCR scene, the book recognition function is called, otherwise the item identification function is called.

specific,

1. Identify the first image by the scene classification model, and determine that the object to be identified is a book, a newspaper, or an item of a non-book newspaper.

Among them, newspapers and periodicals include publications such as newspapers and magazines.

2. If the recognition object is a book, or if the recognition object is a newspaper, it is determined that the corresponding image recognition function of the first image in the associated coarse scene is a reading recognition function, and the reading recognition function is called to recognize the first image, that is, OCR recognition is performed.

After the OCR is recognized, the recognition result can also be output in the form of a voice broadcast.

In the actual application, the process of calling the reading recognition function to recognize the first image can also be optimized as follows:

After completing the OCR recognition output, it is necessary to collect the image of the recognition object again to analyze whether the image needs to be recognized again. Because blind people may move books up and down and left and right when reading books. In this case, there is no need to identify them again. Only when the user turns pages, OCR recognition needs to be performed again. This can avoid repeating the broadcast from the beginning and effectively improve the user experience.

The specific implementation is as follows:

The second image of the recognition object is continuously acquired while the reading recognition function is called to recognize the first image. Whenever a second image is acquired, the content similarity between the second image and the first image is determined, and if the content similarity is lower than the first threshold, the continuous acquisition of the second image is stopped, and OCR is performed again. Recognizing that the second image is the new first image, re-executing the call recognition function to recognize the new first image, and simultaneously acquiring the new second image of the recognition object, determining the new second image and Content similarity between the first images, if the content similarity is lower than the first threshold, stopping the continuous acquisition of the second image, performing the recognition of the OCR again, and continuously acquiring the new second image of the recognition object, determining the new The content similarity between the second image and the first image and subsequent steps are cycled until the recognition of the object is recognized.

For example, first save the image in the first frame OCR recognition, and then continuously collect the picture and perform content similarity judgment. If the similarity is lower than a set threshold, it is considered that the user may have turned the page, and the OCR recognition needs to be performed again, otherwise it is considered The user is moving the current page and does not need OCR recognition again.

The process of determining the similarity of content between the second image and the first image may be implemented by feature point matching.

Specifically, the second feature point in the second image is extracted, the first feature point in the first image is extracted, and the second image and the first image are determined according to the number of second feature points matching the first feature point. Content similarity between images.

For example, the ORB/SIFT points of the first frame image and the current frame image are respectively extracted, matched, and judged according to the number of matching success points, and the more points that are successfully matched, the higher the similarity.

3. If the object to be recognized is an item of a non-book newspaper, it is determined that the corresponding image recognition function of the first image in the associated coarse scene is an item identification function, and the item identification function is called to identify the first image.

At this point, the precise identification of the identified object is completed.

After the identification is performed, the image recognition method provided in this embodiment further determines whether the image recognition function in the scene is used, that is, whether the recognition is completed. If the recognition is completed, the mobile computing processing unit re-adjusts the focal length of the image acquisition unit (such as the camera) to the mid-focus position, that is, controls the focal length of the image acquisition unit to be the mid-focus focal length, and enters the next working cycle.

Specifically:

1) If the image acquisition unit is a camera with a focal length controllable, the movement calculation processing unit adjusts the focal length of the image acquisition unit at an intermediate distance by a connection established with the image acquisition unit, and the current focal length of the image acquisition unit is a focal length of the focal length. The work of adjusting the focal length of the image acquisition unit to the mid-focus position is completed.

2) if the image acquisition unit is more than three fixed focus cameras, wherein at least one telephoto camera, at least one medium focus camera, and at least one short focus camera, the mobile computing processing unit acquires the image through the connection established with the image acquisition unit. The middle focus camera is selected as the default camera for the next photograph, and the work of adjusting the focal length of the image acquisition unit to the middle focus position is completed.

In addition, the manner of determining whether the recognition is completed includes, but is not limited to, continuously acquiring an image for processing, and determining whether the current recognition function ends according to an output result of the image recognition function in the scene.

For example, for the traffic light recognition function in a telephoto scene, if the probability of the fourth category (non-red, green, and yellow light) is the highest, the function is considered to be ended.

Specifically, after the traffic light recognition function is called, after the identification object is determined to be a red light, a green light, a yellow light, or a non-red, green, and yellow light according to the first image, if the identification object is determined to be a non-red, green, and yellow light according to the first image, the identification is determined. Finish, end the image recognition method. If the identification object is determined to be a red light, a green light or a yellow light according to the first image, the second image of the recognition object is continuously collected, and whenever a second image is acquired, the traffic light recognition function is called, and the second image is determined according to the second image. The object is a red light, a green light, a yellow light or a non-red, green and yellow light. If the identification object is determined to be a non-red, green and yellow light according to the second image, it is determined that the recognition is completed, the continuous acquisition of the second image is stopped, and the image recognition method is ended. .

For another example, the reading recognition function in the short-focus scene can be judged according to the number of output OCR characters and the degree of confidence. If the characters are few or the character confidence is low, the reading recognition function is considered to be ended.

Specifically, after continuously acquiring the second image of the recognition object, each time a second image is acquired, the book recognition function is called to identify the number of characters and the character confidence in the second image. If the number of characters in the second image is less than the second threshold and/or the character confidence in the second image is less than the third threshold, determining that the recognition is completed, stopping the continuous acquisition of the second image, and ending the image recognition method .

For example, for the face recognition function in the mid-focus scene, the human/face detection can be performed to determine whether there is a human body/face in the scene that meets a certain size, and if not, the face recognition function is considered to be ended.

Specifically, after the image recognition function corresponding to the first image in the associated coarse scene is called, after the first image is recognized, the second image of the recognition object is continuously collected, and each time a second image is collected, the sheet is The human body or the human face in the second image is detected. If the detection result is no human body and no face, or the detection result is that the human body size is smaller than the fourth threshold, or the detection result is that the face size is smaller than the fifth threshold, the identification is determined. Upon completion, the continuous acquisition of the second image is stopped, and the image recognition method is ended.

So far, the image recognition method provided by the embodiment is completed. The basic idea of the method is shown in FIG. 3, specifically: the image recognition function required by the blind person is divided into a telephoto (corresponding to a long distance), a medium focus (corresponding to an intermediate distance), and a short focus (corresponding to a close distance) according to the use distance, For example, traffic light recognition requires telephoto shooting, face recognition requires mid-focus shooting, and OCR requires short-focus shooting. When the image recognition system is working, the mobile computing processing unit first controls the image acquisition unit (such as a focal length controllable camera) to focus on the focal position, and collects a scene image of the recognition object, although in this case for the short focal scene and The image captured in the telephoto scene is not clear enough or the resolution of the target area is not enough, but the application can roughly classify the image to determine whether the scene suitable for telephoto is suitable for the short-focus scene or the medium-focus scene, and then according to The rough classification scene type determines the recognition focal length, adjusts the camera lens to the corresponding recognition focal length, and then uses the recognition focal length to collect an image of the recognition object, and then calls the image recognition function of the corresponding scene; if there are several subdivisions in the scene The function can first classify the scene and then call the specific image recognition function. Finally, the results of the recognition can be fed back to the blind by means of voice announcements.

Beneficial effects:

In the embodiment of the present application, the scene image of the recognition object is collected by using the focal length of the focal point, the recognition focal length is determined according to the scene image, and the recognition object is identified by using the recognition focal length, thereby realizing automatic environment for the environment without user intervention and input. Identification, based on the environment to choose the appropriate focal length to achieve the best shooting results, thereby improving the accuracy of recognition, greatly improving the convenience of life for the blind.

In another aspect, embodiments of the present application further provide a computer program product for use in conjunction with an electronic device including a display, the computer program product comprising a computer readable storage medium and a computer program mechanism embedded therein, The computer program mechanism includes instructions for performing the various steps described below:

Determining the focal length according to the scene image;

The recognition object is identified by the recognition focal length.

Optionally, determining the recognized focal length according to the scene image includes:

The scene image is coarsely classified by the scene rough classification model to determine the coarse scene to which the scene image belongs;

Determining a focal length corresponding to the coarse scene to identify the focal length;

Wherein, the coarse scene corresponds to one or more image recognition functions;

The coarse scene is a telephoto scene, a medium focus scene or a short focus scene;

The scene rough classification model is obtained by deep learning the samples of the telephoto scene, the samples of the medium focal scene, and the samples of the short focus scene.

Optionally, identifying the object by identifying the focal length includes:

Acquiring a first image of the recognition object by recognizing a focal length;

If the corresponding rough scene corresponds to an image recognition function, the image recognition function corresponding to the coarse scene is called to identify the first image;

If the corresponding coarse image corresponds to multiple image recognition functions, the first image is finely classified by the scene fine classification model, and the corresponding image recognition function of the first image in the associated coarse scene is determined; the first image is called to correspond to the corresponding coarse scene. The image recognition function identifies the first image.

Optionally, the associated coarse scene is a telephoto scene, and the image recognition function corresponding to the telephoto scene is a traffic light recognition function;

The image recognition function corresponding to the coarse scene is called to identify the first image, including:

Calling the traffic light recognition function to determine whether the recognition object is a red light, a green light or a yellow light according to the first image; or

The traffic light recognition function is called, and the identification object is determined to be a red light, a green light, a yellow light or a non-red, green and yellow light according to the first image.

Optionally, the traffic light recognition function is invoked, and after determining that the identification object is a red light, a green light, a yellow light, or a non-red, green, and yellow light according to the first image, the method further includes:

If it is determined that the identification object is a non-red, green, and yellow light according to the first image, determining that the recognition is completed, ending the image recognition method;

If the identification object is determined to be a red light, a green light or a yellow light according to the first image, the second image of the recognition object is continuously collected, and whenever a second image is acquired, the traffic light recognition function is called, and the second image is determined according to the second image. The object is a red light, a green light, a yellow light or a non-red, green and yellow light. If the identification object is determined to be a non-red, green and yellow light according to the second image, it is determined that the recognition is completed, the continuous acquisition of the second image is stopped, and the image recognition method is ended. .

Optionally, the associated coarse scene is a short-focus scene, and the image recognition function corresponding to the short-focus scene is a reading recognition function and an item identification function;

The first image is finely classified by the scene fine classification model, and the corresponding image recognition function of the first image in the associated coarse scene is determined, including:

Identifying the first image by the scene sub-classification model, and determining that the object of recognition is a book, a newspaper, or an article of a non-book newspaper;

If the recognition object is a book, or if the recognition object is a newspaper, determining that the corresponding image recognition function of the first image in the associated coarse scene is a reading recognition function;

If the identification object is an item other than the book newspaper, it is determined that the corresponding image recognition function of the first image in the associated coarse scene is the item identification function.

Optionally, the corresponding image recognition function of the first image in the associated coarse scene is a reading recognition function;

The image recognition function corresponding to the first image in the associated coarse scene is called to identify the first image, including:

Calling the book recognition function to recognize the first image, and simultaneously acquiring the second image of the recognition object;

Whenever a second image is acquired, the content similarity between the second image and the first image is determined, and if the content similarity is lower than the first threshold, the continuous acquisition of the second image is stopped, and the first The second image is used as a new first image, and the call recognition function is re-executed to recognize the new first image, and at the same time, the new second image of the recognition object is continuously acquired, and the new second image and the first image are determined. Content similarity and subsequent steps.

Optionally, determining content similarity between the second image and the first image, including:

Extracting a second feature point in the second image, and extracting a first feature point in the first image;

Content similarity between the second image and the first image is determined according to the number of second feature points matching the first feature point.

Optionally, after continuously acquiring the second image of the recognition object, the method further includes:

Whenever a second image is acquired, the book recognition function is called to identify the number of characters and the character confidence in the second image;

If the number of characters in the second image is less than the second threshold and/or the character confidence in the second image is less than the third threshold, determining that the recognition is completed, stopping the continuous acquisition of the second image, and ending the image recognition method .

Optionally, the corresponding image recognition function of the first image in the associated coarse scene is a face recognition function in the medium focus scene;

The image recognition function corresponding to the first image in the associated coarse scene is called, and after the first image is identified, the method further includes:

Continuously acquiring a second image of the recognition object;

Whenever a second image is acquired, the human body or the face in the second image is detected;

If the detection result is no human body and no face, or the detection result is that the human body size is smaller than the fourth threshold, or the detection result is that the face size is smaller than the fifth threshold, it is determined that the recognition is completed, the continuous acquisition of the second image is stopped, and the end is ended. Image recognition method.

Beneficial effects:

Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart and/or block diagrams, and combinations of flows and/or blocks in the flowcharts and/or block diagrams can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

While the preferred embodiment of the present application has been described, it will be apparent that those skilled in the art can make further changes and modifications to the embodiments. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and the modifications and

Claims

An image recognition method, characterized in that the method comprises:

Collecting a scene image of the recognition object by using a focal length of the focal length;

Determining a focal length according to the scene image;

The identification object is identified by the recognition focal length.
The method according to claim 1, wherein the determining the recognition focus according to the scene image comprises:

Performing rough classification on the scene image by using a scene rough classification model, and determining that the scene image belongs to a coarse scene;

Determining a focal length corresponding to the coarse scene to identify the focal length;

Wherein, the coarse scene corresponds to one or more image recognition functions;

The coarse scene is a telephoto scene, a medium focus scene or a short focus scene;

The scene rough classification model is obtained by deep learning the samples of the telephoto scene, the samples of the medium focal scene, and the samples of the short focus scene.
The method according to claim 2, wherein the identifying the identified object by using the recognized focal length comprises:

Acquiring the first image of the identification object by using the recognition focal length;

If the associated rough scene corresponds to an image recognition function, the image recognition function corresponding to the coarse scene is called to identify the first image;

If the corresponding coarse image corresponds to multiple image recognition functions, the first image is finely classified by the scene fine classification model, and the image recognition function corresponding to the first image in the associated coarse scene is determined; the first image is called. The first image is identified in a corresponding image recognition function in the associated coarse scene.
The method according to claim 3, wherein the belonging rough scene is a telephoto scene, and the image recognition function corresponding to the telephoto scene is a traffic light recognition function;

The image recognition function corresponding to the coarse scene is called, and the first image is identified, including:

Calling a traffic light recognition function, determining, according to the first image, that the identification object is a red light, a green light, or a yellow light; or

The traffic light recognition function is called, and the identification object is determined to be a red light, a green light, a yellow light or a non-red green light according to the first image.
The method according to claim 4, wherein the calling the traffic light recognition function, after determining that the identification object is a red light, a green light, a yellow light or a non-red green light according to the first image, further comprises:

If it is determined that the identification object is a non-red, green, and yellow light according to the first image, determining that the identification is completed, ending the image recognition method;

And if the identification object is determined to be a red light, a green light, or a yellow light according to the first image, the second image of the identification object is continuously collected, and whenever a second image is collected, the traffic light recognition function is invoked, according to the The second image determines that the identification object is a red light, a green light, a yellow light or a non-red green yellow light. If it is determined that the identification object is a non-red, green, and yellow light according to the second image, it is determined that the recognition is completed, and the stop is stopped. The continuous acquisition of the two images ends the image recognition method.
The method according to claim 3, wherein the belonging rough scene is a short-focus scene, and the image recognition function corresponding to the short-focus scene is a reading recognition function and an item recognition function;

The finely classifying the first image by the scene fine classification model, and determining the corresponding image recognition function of the first image in the associated coarse scene, including:

Identifying the first image by using a scene classification model, and determining that the identification object is a book, a newspaper, or an item of a non-book newspaper;

If the identification object is a book, or if the identification object is a newspaper, determining that the corresponding image recognition function of the first image in the associated coarse scene is a reading recognition function;

If the identification object is an item other than the book newspaper, it is determined that the corresponding image recognition function of the first image in the associated coarse scene is an item identification function.
The method according to claim 6, wherein the image recognition function corresponding to the first image in the associated coarse scene is a reading recognition function;

The invoking the image recognition function corresponding to the first image in the associated coarse scene, and identifying the first image, includes:

Calling the reading recognition function to identify the first image, and simultaneously acquiring the second image of the identification object;

Determining content similarity between the second image and the first image each time a second image is acquired, and if the content similarity is lower than the first threshold, stopping continuous acquisition of the second image, Transmitting the second image as a new first image, re-executing the call recognition function to recognize the new first image, and simultaneously acquiring the new second image of the recognition object, determining the new second image and the Content similarity between the first images and subsequent steps.
The method according to claim 7, wherein the determining the content similarity between the second image and the first image comprises:

Extracting a second feature point in the second image, and extracting a first feature point in the first image;

And determining content similarity between the second image and the first image according to the number of second feature points matching the first feature point.
The method according to claim 7 or 8, wherein after the continuously acquiring the second image of the identification object, the method further comprises:

Whenever a second image is acquired, the book recognition function is called to identify the number of characters and the character confidence in the second image;

If the number of characters in the second image is less than the second threshold and/or the character confidence in the second image is less than the third threshold, determining that the recognition is completed, stopping the continuous acquisition of the second image, ending the image recognition methods.
The method according to claim 3, wherein the corresponding image recognition function of the first image in the associated coarse scene is a face recognition function in the medium focus scene;

The invoking the image recognition function corresponding to the first image in the associated coarse scene, after identifying the first image, further includes:

Collecting a second image of the identification object continuously;

Whenever a second image is acquired, the human body or the face in the second image is detected;

If the detection result is no human body and no face, or the detection result is that the human body size is smaller than the fourth threshold, or the detection result is that the face size is smaller than the fifth threshold, it is determined that the recognition is completed, the continuous acquisition of the second image is stopped, and the end is ended. The image recognition method.
An electronic device, comprising:

a memory, one or more processors; a memory coupled to the processor via a communication bus; a processor configured to execute instructions in the memory; the storage medium storing the method of any one of claims 1 to 10 Instructions for each step.
The electronic device of claim 11, wherein the electronic device is a smart phone.
A computer program product for use in conjunction with an electronic device including a display, the computer program product comprising a computer readable storage medium and a computer program mechanism embodied therein, the computer program mechanism comprising for performing claim 1 10 instructions for each of the steps in any of the methods.
An image recognition system, comprising: an image acquisition unit and a mobile calculation processing unit;

The image acquisition unit is a camera with a controllable focus, wherein the focal length controllable range includes a telephoto focal length, a medium focal length, and a short focal length; or

The image acquisition unit is three or more fixed focus cameras, wherein at least one telephoto camera, at least one medium focus camera, and at least one short focus camera;

The mobile computing processing unit is the electronic device of claim 13 or 14;

The mobile computing processing unit is coupled to the image acquisition unit via a universal serial bus USB or wireless communication.
The system according to claim 14, wherein the wireless communication mode is a Bluetooth mode;

The image acquisition unit is located on the wearable glasses.
A system according to claim 14 or 15, wherein

The movement calculation processing unit is configured to control a focal length of the image acquisition unit to be a focal length of the medium focus, and acquire an image of the scene in which the recognition object is located by an image acquisition unit having a focal length of the focal length of the focus.
The system of claim 16 wherein:

The mobile technology processing unit is further configured to control a focal length of the image acquiring unit to be the recognized focal length, and acquire an image of the first object by an image acquiring unit that recognizes a focal length by a focal length.
The system of claim 17 wherein:

The mobile technology processing unit is further configured to control a focal length of the image acquiring unit to be a mid-focus focal length after determining that the identification is completed.