US10796685B2 - Method and device for image recognition - Google Patents

Method and device for image recognition Download PDF

Info

Publication number
US10796685B2
US10796685B2 US15/535,006 US201515535006A US10796685B2 US 10796685 B2 US10796685 B2 US 10796685B2 US 201515535006 A US201515535006 A US 201515535006A US 10796685 B2 US10796685 B2 US 10796685B2
Authority
US
United States
Prior art keywords
recognized
server
information
image
recognized image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US15/535,006
Other versions
US20180204562A1 (en
Inventor
Long GONG
Yanfu ZHANG
Jiawei GU
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Publication of US20180204562A1 publication Critical patent/US20180204562A1/en
Assigned to BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD. reassignment BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GONG, Long, GU, Jiawei, ZHANG, Yanfu
Application granted granted Critical
Publication of US10796685B2 publication Critical patent/US10796685B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2111Selection of the most significant subset of features by using evolutionary computational techniques, e.g. genetic algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06K9/00624
    • G06K9/00671
    • G06K9/4628
    • G06K9/6271
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0454
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/027Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
    • G10L13/043

Definitions

  • the present disclosure relates to the field of computers, particularly to the field of image recognition, and more particularly, to a method and an apparatus for recognizing an image.
  • image recognition is to recognize images using a recognition model established by analyzing features of massive images.
  • the above approach is not suitable for personal users because larger resources are consumed in the recognition process.
  • the recognition model may only be adjusted according to a recognition result outputted by a machine, deviation may be caused to adjustment of the recognition model when more errors occur in the outputted recognition result, which may further reduce the recognition accuracy.
  • the present disclosure provides a method and an apparatus for recognizing an image to solve the technical problem mentioned in the foregoing Background section.
  • the present disclosure provides a method for recognizing an image.
  • the method comprises: acquiring a to-be-recognized image containing a to-be-recognized object; sending the to-be-recognized image to a server, and receiving identification information of a target object corresponding to the to-be-recognized object returned by the server, obtained by recognizing the to-be-recognized image, and a confidence parameter returned by the server, the confidence parameter representing a probability of the to-be-recognized object being the target object; and determining the identification information of the target object as a recognition result when the confidence parameter is greater than a confidence threshold; or acquiring labeled information associated with the to-be-recognized image from a third-party platform and determining the labeled information as the recognition result when the confidence parameter is smaller than the confidence threshold.
  • the present disclosure provides a method for recognizing an image.
  • the method comprises: receiving a to-be-recognized image containing a to-be-recognized object sent by a client; recognizing the to-be-recognized image to obtain identification information of a target object corresponding to the to-be-recognized object and a confidence parameter, the confidence parameter representing a probability of the to-be-recognized object being the target object; and sending the identification information of the target object and the confidence parameter to the client.
  • the present disclosure provides an apparatus for recognizing an image.
  • the apparatus comprises: an acquiring unit, configured to acquire a to-be-recognized image containing a to-be-recognized object; an interacting unit, configured to send the to-be-recognized image to a server, and receive identification information of a target object corresponding to the to-be-recognized object returned by the server, obtained by recognizing the to-be-recognized image, and a confidence parameter returned by the server, the confidence parameter representing a probability of the to-be-recognized object being the target object; and a determining unit, configured to determine the identification information of the target object as a recognition result when the confidence parameter is greater than a confidence threshold, or acquire labeled information associated with the to-be-recognized image from a third-party platform and determine the labeled information as the recognition result when the confidence parameter is smaller than the confidence threshold.
  • the present disclosure provides an apparatus for recognizing an image.
  • the apparatus comprises: a receiving unit, configured to receive a to-be-recognized image containing a to-be-recognized object sent by a client; a recognizing unit, configured to recognize the to-be-recognized image to obtain identification information of a target object corresponding to the to-be-recognized object and a confidence parameter, the confidence parameter representing a probability of the to-be-recognized object being the target object; and a sending unit, configured to send the identification information of the target object and the confidence parameter to the client.
  • a to-be-recognized image containing a to-be-recognized object is acquired; the to-be-recognized image is sent to a server; identification information of a target object corresponding to the to-be-recognized object returned by the server, obtained by recognizing the to-be-recognized image, and a confidence parameter returned by the server are received; and the identification information of the target object is determined as a recognition result when the confidence parameter is greater than a confidence threshold; or labeled information associated with the to-be-recognized image is acquired from a third-party platform and the labeled information is determined as the recognition result when the confidence parameter is smaller than the confidence threshold.
  • Combination of automatic recognition of a server with third-party labeled information is implemented, a recognition accuracy is enhanced, a recognition model corresponding to a machine learning recognition pattern used by the server is trained by using the third-party labeled information to enhance the training result, thereby further enhancing the recognition accuracy.
  • FIG. 1 is an exemplary architecture diagram of a system to which the present disclosure may be applied;
  • FIG. 2 illustrates a flowchart of a method for recognizing an image according to an embodiment of the present disclosure
  • FIG. 3 illustrates a flowchart of a method for recognizing an image according to another embodiment of the present disclosure
  • FIG. 4 illustrates a schematic structural diagram of an apparatus for recognizing an image according to an embodiment of the present disclosure
  • FIG. 5 illustrates a schematic structural diagram of an apparatus for recognizing an image according to another embodiment of the present disclosure.
  • FIG. 6 illustrates a structural schematic diagram of a computer system adapted to implement a terminal device or a server of the embodiments of the present disclosure.
  • FIG. 1 shows an exemplary architecture of a system 100 which may be used by a method and apparatus for recognizing an image according to an embodiment of the present application.
  • the system architecture 100 may include terminal devices 101 , 102 and 103 , a network 104 and a server 105 .
  • the network 104 serves as a medium providing a communication link between the terminal devices 101 , 102 and 103 and the server 105 .
  • the network 104 may include various types of connections, such as wired or wireless transmission links, or optical fibers.
  • the user may use the terminal devices 101 , 102 and 103 to interact with the server 105 through the network 104 , in order to transmit or receive messages, etc.
  • Various communication client applications such as image recognition applications and instant messaging tools, may be installed on the terminal devices 101 , 102 and 103 .
  • the terminal devices 101 , 102 and 103 may be various electronic devices having display screens and supporting network communication, including but not limited to, smart phones, tablet computers, e-book readers, MP3 (Moving Picture Experts Group Audio Layer III) players, MP4 (Moving Picture Experts Group Audio Layer IV) players, laptop computers and desktop computers.
  • MP3 Motion Picture Experts Group Audio Layer III
  • MP4 Motion Picture Experts Group Audio Layer IV
  • the server 105 may be server providing various services, for example, a server in the backend providing support for image recognition applications on the terminal devices 101 , 102 or 103 .
  • the backend server may perform analyzing processes on the image to be recognized, and return a processing result (target object) to the terminal devices.
  • the terminal acquiring the image to be recognized is called a client.
  • a client may be the terminal devices 101 , 102 , 103 or the server 105 , rather than being a particular type of terminal.
  • terminal devices the numbers of the terminal devices, the networks and the servers in the virtual machine cluster in FIG. 1 are merely illustrative. Any number of terminal devices, networks and servers may be provided based on the actual requirements.
  • a flow 200 of a method for recognizing an image according to an embodiment of the present disclosure is illustrated. It is to be noted that the method for recognizing an image provided by the embodiments of the present disclosure generally is performed by the terminal devices 101 , 102 and 103 . Correspondingly, the apparatus for recognizing an image generally is arranged in the terminal devices 101 , 102 and 103 . The method comprises following steps.
  • Step 201 acquiring a to-be-recognized image containing a to-be-recognized object.
  • the to-be-recognized image may be captured by a camera.
  • the camera may be arranged on the terminal device.
  • the terminal device may comprise but is not limited to a mobile terminal or a wearable device (for example, smart glasses). Taking the smart glasses being arranged on the camera as an example, when the user wears the smart glasses, the camera may be utilized to capture an image within a viewing angle range of the camera to serve as the to-be-recognized image.
  • the camera may be started for image capture in response to inputting an image capture instruction. For example, voice information inputted by the user may be received via a microphone, the voice information is resolved to obtain the image capture instruction, and the camera is triggered for image capture.
  • the to-be-recognized image comprises a to-be-recognized object.
  • the camera on the wearable device worn by the user may be utilized to capture an image associated with the scene of the conference site, and the captured to-be-recognized image may comprise the to-be-recognized object such as a table or chair in the conference site.
  • the to-be-recognized object comprises at least one of: a body object, a scene object and a color object.
  • Step 202 sending the to-be-recognized image to the server, and receiving identification information of a target object corresponding to the to-be-recognized object returned by the server, obtained by recognizing the to-be-recognized image, and a confidence parameter returned by the server.
  • the confidence parameter represents a probability of the to-be-recognized object being the target object.
  • the to-be-recognized image may be sent to the server to recognize the to-be-recognized object in the image, and then the confidence parameter and the target object corresponding to the to-be-recognized object, obtained by recognizing the to-be-recognized image by the server may be received.
  • An optional recognition pattern for recognizing the to-be-recognized image by the server is a machine learning recognition pattern.
  • the confidence parameter may be used for representing a probability of the to-be-recognized object being the target object (namely, the similarity between the to-be-recognized object and sample data of the target object) when recognizing the to-be-recognized image.
  • Step 203 determining the identification information of the target object as a recognition result when the confidence parameter is greater than a confidence threshold; or acquiring labeled information associated with the to-be-recognized image from a third-party platform and determining the labeled information as the recognition result when the confidence parameter is smaller than the confidence threshold.
  • optional labeled information comprises information containing the identification information of the target object corresponding to the to-be-recognized object, wherein the information is released by a registered user of the third-party platform.
  • the recognition result of the to-be-recognized image may be further determined.
  • the identification information of the target object may be determined as the identification result.
  • the labeled information associated with the to-be-recognized image may be acquired from the third-party platform.
  • the to-be-recognized image including one round table and three chairs
  • the to-be-recognized image is recognized by means of the server in a machine recognition pattern
  • the target object namely, sample data of the round table object and the chair object
  • identification information namely, the table and the chairs
  • the to-be-recognized image may be sent to the server, labeled information returned by the server may be received, and the labeled information may be determined as the recognition result.
  • the labeled information of the to-be-recognized object may be acquired in the following ways: the to-be-recognized image may be sent to the third-party platform associated with the server, the third-party platform may provide question answering services, wherein the question answering services may be used for issuing questions asked by the user in the form of task, and the answers of the questions are published on the third-party platform by a registered user of the third-party platform.
  • a task for recognizing the to-be-recognized image may be generated using the question answering services of the third-party platform, and then the task is issued to the registered user of the third-party platform.
  • an information input region may be provided when the to-be-recognized image is shown to the registered user.
  • the registered user may determine which target objects are included in the to-be-recognized image, and then fill information such as names and number of the target objects into the information input region. In this way, the labeled information is generated. For example, when the to-be-recognized image in the task for recognizing the to-be-recognized image received by the registered user comprises one round table and three chairs, the registered user may fill information into the information input region using the following formats: round, table, one, chairs, and three.
  • the labeled information may be generated based on the information filled by the registered user.
  • the labeled information includes identification information of the target object corresponding to the to-be-recognized object (namely, “round table and chairs”), and may further include information representing the number of the target objects (namely, “one and three”).
  • the method further comprises: converting the recognition result into voice information and playing the voice information.
  • the recognition result may be converted into voice information, and then the voice information is played for the user.
  • the method further comprises: sending the labeled information to the server to serve as a training sample for training a recognition model corresponding to a machine learning recognition pattern used by the server when the confidence parameter is smaller than the confidence threshold.
  • the application scene of this embodiment may be as below: a user (for example, a blind user) uses a camera on a wearable device to capture a to-be-recognized image (for example, a to-be-recognized image including to-be-recognized objects such as tables and chairs in a conference site) associated with the current scene (for example, the conference site).
  • a to-be-recognized image for example, a to-be-recognized image including to-be-recognized objects such as tables and chairs in a conference site
  • the to-be-recognized image may be sent to the server to recognize the to-be-recognized image, and identification information of a target object corresponding to the to-be-recognized object and a confidence parameter which are returned by the server are received.
  • the identification information of the target object corresponding to the to-be-recognized object may be determined as the recognition result.
  • the to-be-recognized image may be sent to the third-party platform, so that the registered user of the third-party platform determines the target object (for example, the registered user determines that the to-be-recognized image includes target objects such as tables and chairs) corresponding to the to-be-recognized object, then labeled information, containing the identification information of the target object corresponding to the to-be-recognized object, returned by the third-party platform may be received, and the labeled information is determined as the recognition result.
  • a confidence threshold for example, table and chair objects are accurately recognized
  • the recognition result may be converted into voice information for playing.
  • the user may relatively accurately learn situations of the current scene (for example, which objects are included in the scene) based on the captured image.
  • labeled data may be sent to the server to serve as a training sample for training a recognition model corresponding to a machine learning recognition pattern used by the server to enhance a training effect of the recognition model, so that the recognition accuracy may be further enhanced in subsequent image recognition.
  • a flow 300 of a method for recognizing an image according to an embodiment of the present disclosure is illustrated. It is to be noted that the method for recognizing an image provided by the embodiments of the present disclosure generally is performed by a server 105 . Correspondingly, the apparatus for recognizing an image generally is arranged in the server 105 . The method comprises following steps.
  • Step 301 receiving a to-be-recognized image containing a to-be-recognized object sent by a client.
  • the to-be-recognized image comprises a to-be-recognized object.
  • the camera on the smart glasses may be utilized to capture an image, and the captured image may comprise the to-be-recognized object such as a table or chair.
  • Step 302 recognizing the to-be-recognized image to obtain identification information of a target object corresponding to the to-be-recognized object and a confidence parameter.
  • the confidence parameter represents a probability of the to-be-recognized object being the target object.
  • An optional implementation for recognizing the to-be-recognized object is a machine learning pattern.
  • the machine learning pattern may include but is not limited to an auto encoder, sparse coding and deep belief networks.
  • the machine learning pattern also may be referred to as deep learning.
  • recognizing the to-be-recognized image comprises: recognizing the to-be-recognized image using a convolutional neural network model.
  • a recognition model corresponding to the machine learning recognition pattern used by the to-be-recognized image may be first established, and then the to-be-recognized image is recognized by using the recognition model.
  • the principle of recognizing the to-be-recognized image using a recognition model corresponding to the machine learning pattern is outlined as below: when the to-be-recognized image is recognized using the recognition model (for example, the convolutional neural network model), the to-be-recognized object in the to-be-recognized image may be indicated by some features (for example, scale invariant feature transform feature points) to generate an input vector.
  • an output vector representing a target object corresponding to the to-be-recognized object may be obtained, the recognition model may be used for indicating a mapping relation from the input vector to the output vector, and then the to-be-recognized image may be recognized based on the mapping relation.
  • the to-be-recognized object in the to-be-recognized image may be represented by some features (for example, scale invariant feature transform feature points), features of the to-be-recognized object (for example, table object) in the to-be-recognized image may be matched with the target object (for example, sample data of the table object) to obtain the confidence parameter representing a probability of the to-be-recognized object being the target object.
  • features for example, scale invariant feature transform feature points
  • features of the to-be-recognized object for example, table object
  • the target object for example, sample data of the table object
  • the method further comprises: receiving a recognition result for training sent by the client, wherein the recognition result for training comprises labeled information associated with the to-be-recognized image and acquired from a third-party platform, and the labeled information comprises information containing the identification information of the target object corresponding to the to-be-recognized object, the information being released by a registered user of the third-party platform; and training, by using the recognition result for training, a recognition model corresponding to a machine learning pattern.
  • the recognition result for training sent by the client may be labeled information associated with the to-be-recognized image acquired by the client from a third-party platform when the confidence parameter obtained by recognizing the to-be-recognized image using the machine learning pattern is smaller than the confidence threshold.
  • the labeled information comprises information containing the identification information of the target object corresponding to the to-be-recognized object, wherein the information is released by a registered user of the third-party platform.
  • the client may be triggered to send the to-be-recognized image to the third-party platform (for example, the third-party platform providing question answering services) to obtain labeled information of the image.
  • the labeled information may be information containing the identification information of the target object corresponding to the to-be-recognized object, wherein the information is released by the registered user of the third-party platform.
  • the labeled information comprises “round table, one, chairs, and three”.
  • the recognition model may be trained using the labeled information.
  • a feature for example, a scale invariant feature transform feature point
  • the labeled information may serve as an ideal output vector of the convolutional neural network
  • the convolutional neural network may be trained by constituting a vector pair by the input vector and the output vector, so that the recognition model may be trained using a correct recognition result (namely, labeled information acquired by recognizing the to-be-recognized image by the registered user of the third-party platform artificially).
  • a correct recognition result namely, labeled information acquired by recognizing the to-be-recognized image by the registered user of the third-party platform artificially.
  • sample data corresponding to a type of the to-be-recognized object may be preset according to the type of the to-be-recognized object, and then the recognition model is trained using the sample data. For example, images of some common application scenes and labeled information of the images may be acquired in advance to serve as training data.
  • Step 303 sending the identification information of the target object and the confidence parameter to the client.
  • the identification information of the target object corresponding to the to-be-recognized object in the to-be-recognized image and the obtained confidence parameter may be sent to the client.
  • the apparatus 400 comprises: an acquiring unit 401 , an interacting unit 402 and a determining unit 403 .
  • the acquiring unit 401 is configured to acquire a to-be-recognized image containing a to-be-recognized object.
  • the interacting unit 402 is configured to send the to-be-recognized image to a server, and receive identification information of a target object corresponding to the to-be-recognized object returned by the server, obtained by recognizing the to-be-recognized image, and a confidence parameter returned by the server, the confidence parameter representing a probability of the to-be-recognized object being the target object.
  • the determining unit 403 is configured to determine the identification information of the target object as a recognition result when the confidence parameter is greater than a confidence threshold, or acquire labeled information associated with the to-be-recognized image from a third-party platform and determine the labeled information as the recognition result when the confidence parameter is smaller than the confidence threshold.
  • the labeled information comprises information containing the identification information of the target object corresponding to the to-be-recognized object, the information being released by a registered user of the third-party platform.
  • the apparatus 400 further comprises: a playing unit (not shown), configured to convert the recognition result into voice information and play the voice information.
  • the apparatus 400 further comprises: a labeled information sending unit (not shown), configured to send the labeled information to the server to serve as a training sample for training a recognition model corresponding to a machine learning recognition pattern used by the server when the confidence parameter is smaller than the confidence threshold.
  • a labeled information sending unit (not shown), configured to send the labeled information to the server to serve as a training sample for training a recognition model corresponding to a machine learning recognition pattern used by the server when the confidence parameter is smaller than the confidence threshold.
  • the to-be-recognized object comprises at least one of: a body object, a scene object and a color object.
  • the apparatus 500 comprises: a receiving unit 501 , a recognizing unit 502 and a sending unit 503 .
  • the receiving unit 501 is configured to receive a to-be-recognized image containing a to-be-recognized object sent by a client.
  • the recognizing unit 502 is configured to recognize the to-be-recognized image to obtain identification information of a target object corresponding to the to-be-recognized object and a confidence parameter, the confidence parameter representing a probability of the to-be-recognized object being the target object.
  • the sending unit 503 is configured to send the identification information of the target object and the confidence parameter to the client.
  • the recognizing unit 502 comprises: a neural network subunit (not shown), configured to recognize the to-be-recognized image using a convolutional neural network model.
  • the apparatus 500 further comprises: a recognition result receiving unit (not shown), configured to receive a recognition result for training sent by the client, wherein the recognition result for training comprises labeled information associated with the to-be-recognized image and acquired from a third-party platform, and the labeled information comprises information containing the identification information of the target object corresponding to the to-be-recognized object, the information being released by a registered user of the third-party platform; and a training unit (not shown), configured to train, using the recognition result for training, a recognition model corresponding to a machine learning recognition pattern.
  • a recognition result receiving unit configured to receive a recognition result for training sent by the client, wherein the recognition result for training comprises labeled information associated with the to-be-recognized image and acquired from a third-party platform, and the labeled information comprises information containing the identification information of the target object corresponding to the to-be-recognized object, the information being released by a registered user of the third-party platform.
  • FIG. 6 a schematic structural diagram of a computer system. 600 adapted to implement a terminal apparatus or a server of the embodiments of the present disclosure is shown.
  • the computer system 600 includes a central processing unit (CPU) 601 , which may execute various appropriate actions and processes in accordance with a program stored in a read-only memory (ROM) 602 or a program loaded into a random access memory (RAM) 603 from a storage portion 608 .
  • the RAM 603 also stores various programs and data required by operations of the system 600 .
  • the CPU 601 , the ROM 602 and the RAM 603 are connected to each other through a bus 604 .
  • An input/output (I/O) interface 605 is also connected to the bus 604 .
  • the following components are connected to the I/O interface 605 : an input portion 606 including a keyboard, a mouse etc.; an output portion 607 comprising a cathode ray tube (CRT), a liquid crystal display device (LCD), a speaker etc.; a storage portion 608 including a hard disk and the like; and a communication portion 609 comprising a network interface card, such as a LAN card and a modem.
  • the communication portion 609 performs communication processes via a network, such as the Internet.
  • a driver 610 is also connected to the I/O interface 605 as required.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, and a semiconductor memory, may be installed on the driver 610 , to facilitate the retrieval of a computer program from the removable medium 611 , and the installation thereof on the storage portion 608 as needed.
  • an embodiment of the present disclosure includes a computer program product, which comprises a computer program that is tangibly embedded in a machine-readable medium.
  • the computer program comprises program codes for executing the method as illustrated in the flow chart.
  • the computer program may be downloaded and installed from a network via the communication portion 609 , and/or may be installed from the removable media 611 .
  • each block in the flow charts and block diagrams may represent a module, a program segment, or a code portion.
  • the module, the program segment, or the code portion comprises one or more executable instructions for implementing the specified logical function.
  • the functions denoted by the blocks may occur in a sequence different from the sequences shown in the figures. For example, in practice, two blocks in succession may be executed, depending on the involved functionalities, substantially in parallel, or in a reverse sequence.
  • each block in the block diagrams and/or the flow charts and/or a combination of the blocks may be implemented by a dedicated hardware-based system executing specific functions or operations, or by a combination of a dedicated hardware and computer instructions.
  • the units or modules involved in the embodiments of the present disclosure may be implemented by way of software or hardware.
  • the described units or modules may also be provided in a processor, for example, described as: a processor, comprising an acquiring unit, a receiving unit and a processing unit, where the names of these units are not considered as a limitation to the units.
  • the acquiring unit may also be described as “a unit for acquiring a to-be-recognized image containing a to-be-recognized object”.
  • the present disclosure further provides a non-volatile computer storage medium.
  • the non-volatile computer storage medium may be the non-volatile computer storage medium included in the apparatus in the above embodiments, or a stand-alone non-volatile computer storage medium which has not been assembled into the apparatus.
  • the non-volatile computer storage medium stores one or more programs.
  • the one or more programs when executed by a device, cause the device to: acquire a to-be-recognized image containing a to-be-recognized object; send the to-be-recognized image to a server, and receive identification information of a target object corresponding to the to-be-recognized object returned by the server, obtained by recognizing the to-be-recognized image, and a confidence parameter returned by the server, wherein the confidence parameter represents a probability of the to-be-recognized object being the target object; and determine the identification information of the target object as a recognition result when the confidence parameter is greater than a confidence threshold; or acquire labeled information associated with the to-be-recognized image from a third-party platform and determine the labeled information as the recognition result when the confidence parameter is smaller than the confidence threshold.
  • the non-volatile computer storage medium stores one or more programs.
  • the one or more programs when executed by a device, cause the device to: receive a to-be-recognized image containing a to-be-recognized object sent by a client; recognize the to-be-recognized image to obtain identification information of a target object corresponding to the to-be-recognized object and a confidence parameter, wherein the confidence parameter represents a probability of the to-be-recognized object being the target object; and send the identification information of the target object and the confidence parameter to the client.
  • inventive scope of the present disclosure is not limited to the technical solutions formed by the particular combinations of the above technical features.
  • inventive scope should also cover other technical solutions formed by any combinations of the above technical features or equivalent features thereof without departing from the concept of the invention, such as, technical solutions formed by replacing the features as disclosed in the present disclosure with (but not limited to), technical features with similar functions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Physiology (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure discloses a method and an apparatus for recognizing an image. A specific implementation of the method comprises: acquiring a to-be-recognized image containing a to-be-recognized object; sending the to-be-recognized image to a server, and receiving identification information of a target object corresponding to the to-be-recognized object returned by the server, obtained by recognizing the to-be-recognized image, and a confidence parameter returned by the server; and determining the identification information of the target object as a recognition result when the confidence parameter is greater than a confidence threshold; or acquiring labeled information associated with the to-be-recognized image from a third-party platform and determining the labeled information as the recognition result when the confidence parameter is smaller than the confidence threshold.

Description

CROSS-REFERENCE TO RELATED APPLICATION
This present disclosure claims the benefit and priority of Chinese Patent Application No. 201510567452.2 filed on Sep. 8, 2015, the entire content of which is incorporated herein in its entirety by reference.
TECHNICAL FIELD
The present disclosure relates to the field of computers, particularly to the field of image recognition, and more particularly, to a method and an apparatus for recognizing an image.
BACKGROUND
In daily life, users may sometimes have requirements for recognizing photographed images. In the known technologies, image recognition is to recognize images using a recognition model established by analyzing features of massive images. However, when the images are recognized using the above approach, on one hand, the above approach is not suitable for personal users because larger resources are consumed in the recognition process. On the other hand, because the recognition model may only be adjusted according to a recognition result outputted by a machine, deviation may be caused to adjustment of the recognition model when more errors occur in the outputted recognition result, which may further reduce the recognition accuracy.
SUMMARY
The present disclosure provides a method and an apparatus for recognizing an image to solve the technical problem mentioned in the foregoing Background section.
In a first aspect, the present disclosure provides a method for recognizing an image. The method comprises: acquiring a to-be-recognized image containing a to-be-recognized object; sending the to-be-recognized image to a server, and receiving identification information of a target object corresponding to the to-be-recognized object returned by the server, obtained by recognizing the to-be-recognized image, and a confidence parameter returned by the server, the confidence parameter representing a probability of the to-be-recognized object being the target object; and determining the identification information of the target object as a recognition result when the confidence parameter is greater than a confidence threshold; or acquiring labeled information associated with the to-be-recognized image from a third-party platform and determining the labeled information as the recognition result when the confidence parameter is smaller than the confidence threshold.
In a second aspect, the present disclosure provides a method for recognizing an image. The method comprises: receiving a to-be-recognized image containing a to-be-recognized object sent by a client; recognizing the to-be-recognized image to obtain identification information of a target object corresponding to the to-be-recognized object and a confidence parameter, the confidence parameter representing a probability of the to-be-recognized object being the target object; and sending the identification information of the target object and the confidence parameter to the client.
In a third aspect, the present disclosure provides an apparatus for recognizing an image. The apparatus comprises: an acquiring unit, configured to acquire a to-be-recognized image containing a to-be-recognized object; an interacting unit, configured to send the to-be-recognized image to a server, and receive identification information of a target object corresponding to the to-be-recognized object returned by the server, obtained by recognizing the to-be-recognized image, and a confidence parameter returned by the server, the confidence parameter representing a probability of the to-be-recognized object being the target object; and a determining unit, configured to determine the identification information of the target object as a recognition result when the confidence parameter is greater than a confidence threshold, or acquire labeled information associated with the to-be-recognized image from a third-party platform and determine the labeled information as the recognition result when the confidence parameter is smaller than the confidence threshold.
In a fourth aspect, the present disclosure provides an apparatus for recognizing an image. The apparatus comprises: a receiving unit, configured to receive a to-be-recognized image containing a to-be-recognized object sent by a client; a recognizing unit, configured to recognize the to-be-recognized image to obtain identification information of a target object corresponding to the to-be-recognized object and a confidence parameter, the confidence parameter representing a probability of the to-be-recognized object being the target object; and a sending unit, configured to send the identification information of the target object and the confidence parameter to the client.
According to the method and apparatus for recognizing an image provided by the present disclosure, a to-be-recognized image containing a to-be-recognized object is acquired; the to-be-recognized image is sent to a server; identification information of a target object corresponding to the to-be-recognized object returned by the server, obtained by recognizing the to-be-recognized image, and a confidence parameter returned by the server are received; and the identification information of the target object is determined as a recognition result when the confidence parameter is greater than a confidence threshold; or labeled information associated with the to-be-recognized image is acquired from a third-party platform and the labeled information is determined as the recognition result when the confidence parameter is smaller than the confidence threshold. Combination of automatic recognition of a server with third-party labeled information is implemented, a recognition accuracy is enhanced, a recognition model corresponding to a machine learning recognition pattern used by the server is trained by using the third-party labeled information to enhance the training result, thereby further enhancing the recognition accuracy.
BRIEF DESCRIPTION OF THE DRAWINGS
Other features, objectives and advantages of the present disclosure will become more apparent upon reading the detailed description to non-limiting embodiments with reference to the accompanying drawings, wherein:
FIG. 1 is an exemplary architecture diagram of a system to which the present disclosure may be applied;
FIG. 2 illustrates a flowchart of a method for recognizing an image according to an embodiment of the present disclosure;
FIG. 3 illustrates a flowchart of a method for recognizing an image according to another embodiment of the present disclosure;
FIG. 4 illustrates a schematic structural diagram of an apparatus for recognizing an image according to an embodiment of the present disclosure;
FIG. 5 illustrates a schematic structural diagram of an apparatus for recognizing an image according to another embodiment of the present disclosure; and
FIG. 6 illustrates a structural schematic diagram of a computer system adapted to implement a terminal device or a server of the embodiments of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
The present disclosure will be further described below in detail in combination with the accompanying drawings and the embodiments. It should be appreciated that the specific embodiments described herein are merely used for explaining the relevant invention, rather than limiting the invention. In addition, it should be noted that, for the ease of description, only the parts related to the relevant invention are shown in the accompanying drawings.
It should also be noted that the embodiments in the present disclosure and the features in the embodiments may be combined with each other on a non-conflict basis. The present disclosure will be described below in detail with reference to the accompanying drawings and in combination with the embodiments.
FIG. 1 shows an exemplary architecture of a system 100 which may be used by a method and apparatus for recognizing an image according to an embodiment of the present application.
As shown in FIG. 1, the system architecture 100 may include terminal devices 101, 102 and 103, a network 104 and a server 105. The network 104 serves as a medium providing a communication link between the terminal devices 101, 102 and 103 and the server 105. The network 104 may include various types of connections, such as wired or wireless transmission links, or optical fibers.
The user may use the terminal devices 101, 102 and 103 to interact with the server 105 through the network 104, in order to transmit or receive messages, etc. Various communication client applications, such as image recognition applications and instant messaging tools, may be installed on the terminal devices 101, 102 and 103.
The terminal devices 101, 102 and 103 may be various electronic devices having display screens and supporting network communication, including but not limited to, smart phones, tablet computers, e-book readers, MP3 (Moving Picture Experts Group Audio Layer III) players, MP4 (Moving Picture Experts Group Audio Layer IV) players, laptop computers and desktop computers.
The server 105 may be server providing various services, for example, a server in the backend providing support for image recognition applications on the terminal devices 101, 102 or 103. The backend server may perform analyzing processes on the image to be recognized, and return a processing result (target object) to the terminal devices.
It should be noted that, according to the present disclosure, the terminal acquiring the image to be recognized is called a client. A client may be the terminal devices 101, 102, 103 or the server 105, rather than being a particular type of terminal.
It should be appreciated that the numbers of the terminal devices, the networks and the servers in the virtual machine cluster in FIG. 1 are merely illustrative. Any number of terminal devices, networks and servers may be provided based on the actual requirements.
Referring to FIG. 2, a flow 200 of a method for recognizing an image according to an embodiment of the present disclosure is illustrated. It is to be noted that the method for recognizing an image provided by the embodiments of the present disclosure generally is performed by the terminal devices 101, 102 and 103. Correspondingly, the apparatus for recognizing an image generally is arranged in the terminal devices 101, 102 and 103. The method comprises following steps.
Step 201: acquiring a to-be-recognized image containing a to-be-recognized object.
In this embodiment, the to-be-recognized image may be captured by a camera. The camera may be arranged on the terminal device. The terminal device may comprise but is not limited to a mobile terminal or a wearable device (for example, smart glasses). Taking the smart glasses being arranged on the camera as an example, when the user wears the smart glasses, the camera may be utilized to capture an image within a viewing angle range of the camera to serve as the to-be-recognized image. In this embodiment, the camera may be started for image capture in response to inputting an image capture instruction. For example, voice information inputted by the user may be received via a microphone, the voice information is resolved to obtain the image capture instruction, and the camera is triggered for image capture. In this embodiment, the to-be-recognized image comprises a to-be-recognized object. For example, when the user enters a conference site, the camera on the wearable device worn by the user may be utilized to capture an image associated with the scene of the conference site, and the captured to-be-recognized image may comprise the to-be-recognized object such as a table or chair in the conference site.
In some optional implementations of this embodiment, the to-be-recognized object comprises at least one of: a body object, a scene object and a color object.
Step 202: sending the to-be-recognized image to the server, and receiving identification information of a target object corresponding to the to-be-recognized object returned by the server, obtained by recognizing the to-be-recognized image, and a confidence parameter returned by the server.
In this embodiment, the confidence parameter represents a probability of the to-be-recognized object being the target object.
After acquiring the to-be-recognized image, the to-be-recognized image may be sent to the server to recognize the to-be-recognized object in the image, and then the confidence parameter and the target object corresponding to the to-be-recognized object, obtained by recognizing the to-be-recognized image by the server may be received. An optional recognition pattern for recognizing the to-be-recognized image by the server is a machine learning recognition pattern. In this embodiment, the confidence parameter may be used for representing a probability of the to-be-recognized object being the target object (namely, the similarity between the to-be-recognized object and sample data of the target object) when recognizing the to-be-recognized image. The higher the value of the confidence parameter is, the larger the probability of the to-be-recognized object being the target object is.
Step 203: determining the identification information of the target object as a recognition result when the confidence parameter is greater than a confidence threshold; or acquiring labeled information associated with the to-be-recognized image from a third-party platform and determining the labeled information as the recognition result when the confidence parameter is smaller than the confidence threshold.
In this embodiment, optional labeled information comprises information containing the identification information of the target object corresponding to the to-be-recognized object, wherein the information is released by a registered user of the third-party platform. In this embodiment, after obtaining the confidence parameter returned by the server, the recognition result of the to-be-recognized image may be further determined. When the confidence parameter is greater than the confidence threshold, the identification information of the target object may be determined as the identification result. When the confidence parameter is smaller than the confidence threshold, the labeled information associated with the to-be-recognized image may be acquired from the third-party platform. Taking the to-be-recognized image including one round table and three chairs as an example, when the to-be-recognized image is recognized by means of the server in a machine recognition pattern, for example, when the to-be-recognized object is matched with the target object (namely, sample data of the round table object and the chair object), in the event that the confidence parameter is greater than the confidence threshold, identification information (namely, the table and the chairs) of the target objects (namely, table object and chair object) may be determined as the recognition result. In the event that the confidence parameter is smaller than the confidence threshold, the to-be-recognized image may be sent to the server, labeled information returned by the server may be received, and the labeled information may be determined as the recognition result.
In this embodiment, the labeled information of the to-be-recognized object may be acquired in the following ways: the to-be-recognized image may be sent to the third-party platform associated with the server, the third-party platform may provide question answering services, wherein the question answering services may be used for issuing questions asked by the user in the form of task, and the answers of the questions are published on the third-party platform by a registered user of the third-party platform. After the to-be-recognized image is sent to the third-party platform, a task for recognizing the to-be-recognized image may be generated using the question answering services of the third-party platform, and then the task is issued to the registered user of the third-party platform. When the registered user receives the task for recognizing the to-be-recognized image, an information input region may be provided when the to-be-recognized image is shown to the registered user. The registered user may determine which target objects are included in the to-be-recognized image, and then fill information such as names and number of the target objects into the information input region. In this way, the labeled information is generated. For example, when the to-be-recognized image in the task for recognizing the to-be-recognized image received by the registered user comprises one round table and three chairs, the registered user may fill information into the information input region using the following formats: round, table, one, chairs, and three. Next, the labeled information may be generated based on the information filled by the registered user. The labeled information includes identification information of the target object corresponding to the to-be-recognized object (namely, “round table and chairs”), and may further include information representing the number of the target objects (namely, “one and three”).
In some optional implementations of this embodiment, the method further comprises: converting the recognition result into voice information and playing the voice information. In this implementation, after obtaining the final recognition result, the recognition result may be converted into voice information, and then the voice information is played for the user.
In some optional implementations of this embodiment, the method further comprises: sending the labeled information to the server to serve as a training sample for training a recognition model corresponding to a machine learning recognition pattern used by the server when the confidence parameter is smaller than the confidence threshold.
The application scene of this embodiment may be as below: a user (for example, a blind user) uses a camera on a wearable device to capture a to-be-recognized image (for example, a to-be-recognized image including to-be-recognized objects such as tables and chairs in a conference site) associated with the current scene (for example, the conference site). Next, the to-be-recognized image may be sent to the server to recognize the to-be-recognized image, and identification information of a target object corresponding to the to-be-recognized object and a confidence parameter which are returned by the server are received. When the confidence parameter is greater than a confidence threshold (for example, table and chair objects are accurately recognized), the identification information of the target object corresponding to the to-be-recognized object may be determined as the recognition result. When the confidence parameter is smaller than the confidence threshold, the to-be-recognized image may be sent to the third-party platform, so that the registered user of the third-party platform determines the target object (for example, the registered user determines that the to-be-recognized image includes target objects such as tables and chairs) corresponding to the to-be-recognized object, then labeled information, containing the identification information of the target object corresponding to the to-be-recognized object, returned by the third-party platform may be received, and the labeled information is determined as the recognition result. After determining the recognition result, the recognition result may be converted into voice information for playing. In this way, the user may relatively accurately learn situations of the current scene (for example, which objects are included in the scene) based on the captured image. Further, when the confidence parameter is smaller than the confidence threshold, labeled data may be sent to the server to serve as a training sample for training a recognition model corresponding to a machine learning recognition pattern used by the server to enhance a training effect of the recognition model, so that the recognition accuracy may be further enhanced in subsequent image recognition.
Referring to FIG. 3, a flow 300 of a method for recognizing an image according to an embodiment of the present disclosure is illustrated. It is to be noted that the method for recognizing an image provided by the embodiments of the present disclosure generally is performed by a server 105. Correspondingly, the apparatus for recognizing an image generally is arranged in the server 105. The method comprises following steps.
Step 301: receiving a to-be-recognized image containing a to-be-recognized object sent by a client.
In this embodiment, the to-be-recognized image comprises a to-be-recognized object. For example, when the user enters a conference site, the camera on the smart glasses may be utilized to capture an image, and the captured image may comprise the to-be-recognized object such as a table or chair.
Step 302: recognizing the to-be-recognized image to obtain identification information of a target object corresponding to the to-be-recognized object and a confidence parameter.
In this embodiment, the confidence parameter represents a probability of the to-be-recognized object being the target object. An optional implementation for recognizing the to-be-recognized object is a machine learning pattern. The machine learning pattern may include but is not limited to an auto encoder, sparse coding and deep belief networks. The machine learning pattern also may be referred to as deep learning.
In some optional implementations of this embodiment, recognizing the to-be-recognized image comprises: recognizing the to-be-recognized image using a convolutional neural network model.
In this embodiment, a recognition model corresponding to the machine learning recognition pattern used by the to-be-recognized image may be first established, and then the to-be-recognized image is recognized by using the recognition model. The principle of recognizing the to-be-recognized image using a recognition model corresponding to the machine learning pattern is outlined as below: when the to-be-recognized image is recognized using the recognition model (for example, the convolutional neural network model), the to-be-recognized object in the to-be-recognized image may be indicated by some features (for example, scale invariant feature transform feature points) to generate an input vector. After the to-be-recognized image is recognized using the recognition model, an output vector representing a target object corresponding to the to-be-recognized object may be obtained, the recognition model may be used for indicating a mapping relation from the input vector to the output vector, and then the to-be-recognized image may be recognized based on the mapping relation.
In this embodiment, when the to-be-recognized image is recognized using the recognition model, the to-be-recognized object in the to-be-recognized image may be represented by some features (for example, scale invariant feature transform feature points), features of the to-be-recognized object (for example, table object) in the to-be-recognized image may be matched with the target object (for example, sample data of the table object) to obtain the confidence parameter representing a probability of the to-be-recognized object being the target object.
In some optional implementations of this embodiment, the method further comprises: receiving a recognition result for training sent by the client, wherein the recognition result for training comprises labeled information associated with the to-be-recognized image and acquired from a third-party platform, and the labeled information comprises information containing the identification information of the target object corresponding to the to-be-recognized object, the information being released by a registered user of the third-party platform; and training, by using the recognition result for training, a recognition model corresponding to a machine learning pattern.
In this embodiment, the recognition result for training sent by the client may be labeled information associated with the to-be-recognized image acquired by the client from a third-party platform when the confidence parameter obtained by recognizing the to-be-recognized image using the machine learning pattern is smaller than the confidence threshold. The labeled information comprises information containing the identification information of the target object corresponding to the to-be-recognized object, wherein the information is released by a registered user of the third-party platform. Taking the to-be-recognized image including a round table and three chairs as an example, when the confidence parameter obtained by recognizing the to-be-recognized image using the machine learning pattern is smaller than the confidence threshold, that is, when round table or chair objects are not recognized accurately, the client may be triggered to send the to-be-recognized image to the third-party platform (for example, the third-party platform providing question answering services) to obtain labeled information of the image. The labeled information may be information containing the identification information of the target object corresponding to the to-be-recognized object, wherein the information is released by the registered user of the third-party platform. For example, the labeled information comprises “round table, one, chairs, and three”.
In this embodiment, the recognition model may be trained using the labeled information. Taking the recognition model being a convolutional neural network as an example, a feature (for example, a scale invariant feature transform feature point) of the to-be-recognized image may serve as an input vector of the convolutional neural network, the labeled information may serve as an ideal output vector of the convolutional neural network, and the convolutional neural network may be trained by constituting a vector pair by the input vector and the output vector, so that the recognition model may be trained using a correct recognition result (namely, labeled information acquired by recognizing the to-be-recognized image by the registered user of the third-party platform artificially). In this way, the training effect of the recognition model is enhanced, and further the recognition accuracy is enhanced in the subsequent recognition of the to-be-recognized image.
In this embodiment, sample data corresponding to a type of the to-be-recognized object may be preset according to the type of the to-be-recognized object, and then the recognition model is trained using the sample data. For example, images of some common application scenes and labeled information of the images may be acquired in advance to serve as training data.
Step 303: sending the identification information of the target object and the confidence parameter to the client.
In this embodiment, after recognizing the to-be-recognized image, the identification information of the target object corresponding to the to-be-recognized object in the to-be-recognized image and the obtained confidence parameter may be sent to the client.
Referring to FIG. 4, a schematic structural diagram of an apparatus for recognizing an image according to an embodiment of the present disclosure is illustrated. The apparatus 400 comprises: an acquiring unit 401, an interacting unit 402 and a determining unit 403. The acquiring unit 401 is configured to acquire a to-be-recognized image containing a to-be-recognized object. The interacting unit 402 is configured to send the to-be-recognized image to a server, and receive identification information of a target object corresponding to the to-be-recognized object returned by the server, obtained by recognizing the to-be-recognized image, and a confidence parameter returned by the server, the confidence parameter representing a probability of the to-be-recognized object being the target object. The determining unit 403 is configured to determine the identification information of the target object as a recognition result when the confidence parameter is greater than a confidence threshold, or acquire labeled information associated with the to-be-recognized image from a third-party platform and determine the labeled information as the recognition result when the confidence parameter is smaller than the confidence threshold.
In some optional implementations of this embodiment, the labeled information comprises information containing the identification information of the target object corresponding to the to-be-recognized object, the information being released by a registered user of the third-party platform.
In some optional implementations of this embodiment, the apparatus 400 further comprises: a playing unit (not shown), configured to convert the recognition result into voice information and play the voice information.
In some optional implementations of this embodiment, the apparatus 400 further comprises: a labeled information sending unit (not shown), configured to send the labeled information to the server to serve as a training sample for training a recognition model corresponding to a machine learning recognition pattern used by the server when the confidence parameter is smaller than the confidence threshold.
In some optional implementations of this embodiment, the to-be-recognized object comprises at least one of: a body object, a scene object and a color object.
Referring to FIG. 5, a schematic structural diagram of an apparatus for recognizing an image according to another embodiment of the present disclosure is illustrated. The apparatus 500 comprises: a receiving unit 501, a recognizing unit 502 and a sending unit 503. The receiving unit 501 is configured to receive a to-be-recognized image containing a to-be-recognized object sent by a client. The recognizing unit 502 is configured to recognize the to-be-recognized image to obtain identification information of a target object corresponding to the to-be-recognized object and a confidence parameter, the confidence parameter representing a probability of the to-be-recognized object being the target object. The sending unit 503 is configured to send the identification information of the target object and the confidence parameter to the client.
In some optional implementations of this embodiment, the recognizing unit 502 comprises: a neural network subunit (not shown), configured to recognize the to-be-recognized image using a convolutional neural network model.
In some optional implementations of this embodiment, the apparatus 500 further comprises: a recognition result receiving unit (not shown), configured to receive a recognition result for training sent by the client, wherein the recognition result for training comprises labeled information associated with the to-be-recognized image and acquired from a third-party platform, and the labeled information comprises information containing the identification information of the target object corresponding to the to-be-recognized object, the information being released by a registered user of the third-party platform; and a training unit (not shown), configured to train, using the recognition result for training, a recognition model corresponding to a machine learning recognition pattern.
Referring to FIG. 6, a schematic structural diagram of a computer system. 600 adapted to implement a terminal apparatus or a server of the embodiments of the present disclosure is shown.
As shown in FIG. 6, the computer system 600 includes a central processing unit (CPU) 601, which may execute various appropriate actions and processes in accordance with a program stored in a read-only memory (ROM) 602 or a program loaded into a random access memory (RAM) 603 from a storage portion 608. The RAM 603 also stores various programs and data required by operations of the system 600. The CPU 601, the ROM 602 and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to the bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse etc.; an output portion 607 comprising a cathode ray tube (CRT), a liquid crystal display device (LCD), a speaker etc.; a storage portion 608 including a hard disk and the like; and a communication portion 609 comprising a network interface card, such as a LAN card and a modem. The communication portion 609 performs communication processes via a network, such as the Internet. A driver 610 is also connected to the I/O interface 605 as required. A removable medium 611, such as a magnetic disk, an optical disk, a magneto-optical disk, and a semiconductor memory, may be installed on the driver 610, to facilitate the retrieval of a computer program from the removable medium 611, and the installation thereof on the storage portion 608 as needed.
In particular, according to an embodiment of the present disclosure, the process described above with reference to the flow chart may be implemented in a computer software program. For example, an embodiment of the present disclosure includes a computer program product, which comprises a computer program that is tangibly embedded in a machine-readable medium. The computer program comprises program codes for executing the method as illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 609, and/or may be installed from the removable media 611.
The flowcharts and block diagrams in the figures illustrate architectures, functions and operations that may be implemented according to the system, the method and the computer program product of the various embodiments of the present disclosure. In this regard, each block in the flow charts and block diagrams may represent a module, a program segment, or a code portion. The module, the program segment, or the code portion comprises one or more executable instructions for implementing the specified logical function. It should be noted that, in some alternative implementations, the functions denoted by the blocks may occur in a sequence different from the sequences shown in the figures. For example, in practice, two blocks in succession may be executed, depending on the involved functionalities, substantially in parallel, or in a reverse sequence. It should also be noted that, each block in the block diagrams and/or the flow charts and/or a combination of the blocks may be implemented by a dedicated hardware-based system executing specific functions or operations, or by a combination of a dedicated hardware and computer instructions.
The units or modules involved in the embodiments of the present disclosure may be implemented by way of software or hardware. The described units or modules may also be provided in a processor, for example, described as: a processor, comprising an acquiring unit, a receiving unit and a processing unit, where the names of these units are not considered as a limitation to the units. For example, the acquiring unit may also be described as “a unit for acquiring a to-be-recognized image containing a to-be-recognized object”.
In another aspect, the present disclosure further provides a non-volatile computer storage medium. The non-volatile computer storage medium may be the non-volatile computer storage medium included in the apparatus in the above embodiments, or a stand-alone non-volatile computer storage medium which has not been assembled into the apparatus. The non-volatile computer storage medium stores one or more programs. The one or more programs, when executed by a device, cause the device to: acquire a to-be-recognized image containing a to-be-recognized object; send the to-be-recognized image to a server, and receive identification information of a target object corresponding to the to-be-recognized object returned by the server, obtained by recognizing the to-be-recognized image, and a confidence parameter returned by the server, wherein the confidence parameter represents a probability of the to-be-recognized object being the target object; and determine the identification information of the target object as a recognition result when the confidence parameter is greater than a confidence threshold; or acquire labeled information associated with the to-be-recognized image from a third-party platform and determine the labeled information as the recognition result when the confidence parameter is smaller than the confidence threshold. The non-volatile computer storage medium stores one or more programs. The one or more programs, when executed by a device, cause the device to: receive a to-be-recognized image containing a to-be-recognized object sent by a client; recognize the to-be-recognized image to obtain identification information of a target object corresponding to the to-be-recognized object and a confidence parameter, wherein the confidence parameter represents a probability of the to-be-recognized object being the target object; and send the identification information of the target object and the confidence parameter to the client.
The foregoing is only a description of the preferred embodiments of the present disclosure and the applied technical principles. It should be appreciated by those skilled in the art that the inventive scope of the present disclosure is not limited to the technical solutions formed by the particular combinations of the above technical features. The inventive scope should also cover other technical solutions formed by any combinations of the above technical features or equivalent features thereof without departing from the concept of the invention, such as, technical solutions formed by replacing the features as disclosed in the present disclosure with (but not limited to), technical features with similar functions.

Claims (17)

What is claimed is:
1. A method for recognizing an image applied to a user terminal comprising:
acquiring a to-be-recognized image containing a to-be-recognized object;
sending the to-be-recognized image from the user terminal to a server, and receiving identification information of a target object corresponding to the to-be-recognized object returned by the server, obtained by recognizing the to-be-recognized image using a convolutional neural network model, and a confidence parameter returned by the server, the confidence parameter representing a probability of the to-be-recognized object being the target object; and
in response to the confidence parameter returned by the server being smaller than the confidence threshold, sending the to-be-recognized image from the user terminal to a third-party platform providing question answering services and associated with the server, acquiring manually labeled information associated with the to-be-recognized image from the third-party platform, and determining the manually labeled information as the recognition result, wherein the manually labeled information includes information released by a registered user of the third-party platform.
2. The method according to claim 1, wherein the manually labeled information comprises information containing the identification information of the target object corresponding to the to-be-recognized object.
3. The method according to claim 2, further comprising: converting the recognition result into voice information and playing the voice information.
4. The method according to claim 1, further comprising: sending the manually labeled information to the server to serve as a training sample for training a recognition model corresponding to a machine learning recognition pattern used by the server when the confidence parameter is smaller than the confidence threshold.
5. The method according to claim 1, wherein the to-be-recognized object comprises at least one of: a body object, a scene object and a color object.
6. A method for recognizing an image, applied to a server, comprising:
receiving a to-be-recognized image containing a to-be-recognized object sent by a client;
recognizing the to-be-recognized image to obtain identification information of a target object corresponding to the to-be-recognized object and a confidence parameter, the confidence parameter representing a probability of the to-be-recognized object being the target object; and
sending the identification information of the target object and the confidence parameter to the client, for the client to send the to-be-recognized image from the client to a third-party platform providing question answering services and associated with the server in response to the confidence parameter returned by the server being smaller than the confidence threshold, and determine, based on the confidence parameter, whether to use manually labeled information associated with the to-be-recognized image acquired from the third-party platform as the recognition result, wherein the manually labeled information includes information released by a registered user of the third-party platform,
wherein the recognizing the to-be-recognized image comprises recognizing the to-be-recognized image using a convolutional neural network model.
7. The method according to claim 6, further comprising:
receiving a recognition result for training sent by the client, the recognition result for training comprising manually labeled information associated with the to-be-recognized image and acquired from a third-party platform, the manually labeled information comprising information containing the identification information of the target object corresponding to the to-be-recognized object; and
training, by using the recognition result for training, a recognition model corresponding to a machine learning recognition pattern used for recognizing the to-be-recognized image.
8. A device, comprising:
a processor; and
a memory,
the memory storing a computer-readable instruction that can be executed by the processor, and when the computer-readable instruction being executed, the processor performing operations, the operations comprising:
acquiring a to-be-recognized image containing a to-be-recognized object;
sending the to-be-recognized image from the user terminal to a server, and receiving identification information of a target object corresponding to the to-be-recognized object returned by the server, obtained by recognizing the to-be-recognized image using a convolutional neural network model and a confidence parameter returned by the server, the confidence parameter representing a probability of the to-be-recognized object being the target object; and
in response to the confidence parameter returned by the server being smaller than the confidence threshold, sending the to-be-recognized image from the user terminal to a third-party platform providing question answering services and associated with the server, acquiring manually labeled information associated with the to-be-recognized image from the third-party platform, and determining the manually labeled information as the recognition result.
9. A non-volatile computer storage medium, storing computer-readable instructions that can be executed by a processor, and when the computer-readable instructions are executed by the processor, the processor performs the method according to claim 1.
10. The device according to claim 8, wherein the manually labeled information comprises information containing the identification information of the target object corresponding to the to-be-recognized object.
11. The device according to claim 10, the operations further comprising: converting the recognition result into voice information and playing the voice information.
12. The device according to claim 8, the operations further comprising: sending the manually labeled information to the server to serve as a training sample for training a recognition model corresponding to a machine learning recognition pattern used by the server when the confidence parameter is smaller than the confidence threshold.
13. The device according to claim 8, wherein the to-be-recognized object comprises at least one of: a body object, a scene object and a color object.
14. A device, comprising:
a processor; and
a memory,
the memory storing a computer-readable instruction that can be executed by the processor, and when the computer-readable instruction being executed, the processor performing operations, the operations comprising:
receiving a to-be-recognized image containing a to-be-recognized object sent by a client;
recognizing the to-be-recognized image to obtain identification information of a target object corresponding to the to-be-recognized object and a confidence parameter, the confidence parameter representing a probability of the to-be-recognized object being the target object; and
sending the identification information of the target object and the confidence parameter to the client, for the client to send the to-be-recognized image from the client to a third-party platform providing question answering services and associated with the server in response to the confidence parameter returned by the server being smaller than the confidence threshold, and determine, based on the confidence parameter, whether to use manually labeled information associated with the to-be-recognized image acquired from the third-party platform as the recognition result, wherein the manually labeled information includes information released by a registered user of the third-party platform,
wherein the recognizing the to-be-recognized image comprises recognizing the to-be-recognized image using a convolutional neural network model.
15. The device according to claim 14, the operations further comprising:
receiving a recognition result for training sent by the client, the recognition result for training comprising manually labeled information associated with the to-be-recognized image and acquired from a third-party platform associated with the server, the manually labeled information comprising information containing the identification information of the target object corresponding to the to-be-recognized object; and
training, by using the recognition result for training, a recognition model corresponding to a machine learning recognition pattern used for recognizing the to-be-recognized image.
16. A non-volatile computer storage medium, storing a computer-readable instructions that can be executed by a processor, and when the computer-readable instructions are executed by the processor, the processor performs operations, the operations comprising:
receiving a to-be-recognized image containing a to-be-recognized object sent by a client;
recognizing the to-be-recognized image to obtain identification information of a target object corresponding to the to-be-recognized object and a confidence parameter, the confidence parameter representing a probability of the to-be-recognized object being the target object; and
sending the identification information of the target object and the confidence parameter to the client, for the client to: send the to-be-recognized image from the client to a third-party platform providing question answering services and associated with the server in response to the confidence parameter returned by the server being smaller than the confidence threshold, and determine, based on the confidence parameter, whether to use manually labeled information associated with the to-be-recognized image acquired from a third-party platform providing question answering services and associated with the server as the recognition result, wherein the manually labeled information includes information released by a registered user of the third-party platform,
wherein the recognizing the to-be-recognized image comprises recognizing the to-be-recognized image using a convolutional neural network model.
17. The non-volatile computer storage medium according to claim 16, the operations further comprising:
receiving a recognition result for training sent by the client, the recognition result for training comprising the manually labeled information associated with the to-be-recognized image and acquired from the third-party platform associated with the server, the manually labeled information comprising information containing the identification information of the target object corresponding to the to-be-recognized object, the information being released by a registered user of the third-party platform; and
training, by using the recognition result for training, a recognition model corresponding to a machine learning recognition pattern used for recognizing the to-be-recognized image.
US15/535,006 2015-09-08 2015-12-01 Method and device for image recognition Active US10796685B2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201510567452.2 2015-09-08
CN201510567452 2015-09-08
CN201510567452.2A CN105095919A (en) 2015-09-08 2015-09-08 Image recognition method and image recognition device
PCT/CN2015/096132 WO2017041366A1 (en) 2015-09-08 2015-12-01 Method and device for image recognition

Publications (2)

Publication Number Publication Date
US20180204562A1 US20180204562A1 (en) 2018-07-19
US10796685B2 true US10796685B2 (en) 2020-10-06

Family

ID=54576304

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/535,006 Active US10796685B2 (en) 2015-09-08 2015-12-01 Method and device for image recognition

Country Status (3)

Country Link
US (1) US10796685B2 (en)
CN (1) CN105095919A (en)
WO (1) WO2017041366A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170221110A1 (en) * 2016-02-01 2017-08-03 Mitchell International, Inc. Methods for improving automated damage appraisal and devices thereof

Families Citing this family (65)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095919A (en) * 2015-09-08 2015-11-25 北京百度网讯科技有限公司 Image recognition method and image recognition device
CN107291737B (en) * 2016-04-01 2019-05-14 腾讯科技(深圳)有限公司 Nude picture detection method and device
CN105821538B (en) * 2016-04-20 2018-07-17 广州视源电子科技股份有限公司 The detection method and system of spun yarn fracture
CN106022249A (en) * 2016-05-16 2016-10-12 乐视控股(北京)有限公司 Dynamic object identification method, device and system
CN107625527B (en) * 2016-07-19 2021-04-20 杭州海康威视数字技术股份有限公司 Lie detection method and device
CN107786867A (en) * 2016-08-26 2018-03-09 原相科技股份有限公司 Image identification method and system based on deep learning architecture
US10726573B2 (en) 2016-08-26 2020-07-28 Pixart Imaging Inc. Object detection method and system based on machine learning
CN106682590B (en) * 2016-12-07 2023-08-22 浙江宇视科技有限公司 Processing method of monitoring service and server
JP6744679B2 (en) * 2016-12-07 2020-08-19 深▲セン▼前▲海▼▲達▼▲闥▼▲雲▼端智能科技有限公司Cloudminds (Shenzhen) Robotics Systems Co., Ltd. Human-machine hybrid decision making method and apparatus
CN106874845B (en) * 2016-12-30 2021-03-26 东软集团股份有限公司 Image recognition method and device
US10275687B2 (en) * 2017-02-16 2019-04-30 International Business Machines Corporation Image recognition with filtering of image classification output distribution
CN107145816A (en) * 2017-02-24 2017-09-08 北京悉见科技有限公司 Object identifying tracking and device
CN108573268A (en) * 2017-03-10 2018-09-25 北京旷视科技有限公司 Image-recognizing method and device, image processing method and device and storage medium
CN107276974B (en) * 2017-03-10 2020-11-03 创新先进技术有限公司 Information processing method and device
JP6893606B2 (en) 2017-03-20 2021-06-23 達闥机器人有限公司 Image tagging methods, devices and electronics
CN107832662B (en) * 2017-09-27 2022-05-27 百度在线网络技术(北京)有限公司 Method and system for acquiring image annotation data
CN107908641B (en) * 2017-09-27 2021-03-19 百度在线网络技术(北京)有限公司 Method and system for acquiring image annotation data
CN107758761B (en) * 2017-09-28 2019-10-22 珠海格力电器股份有限公司 Purifier and its control method, device, storage medium and processor
US11222627B1 (en) * 2017-11-22 2022-01-11 Educational Testing Service Exploring ASR-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog system
CN108172213B (en) * 2017-12-26 2022-09-30 北京百度网讯科技有限公司 Surge audio identification method, surge audio identification device, surge audio identification equipment and computer readable medium
CN108171274B (en) * 2018-01-17 2019-08-09 百度在线网络技术(北京)有限公司 The method and apparatus of animal for identification
CN108389316B (en) 2018-03-02 2021-07-13 北京京东尚科信息技术有限公司 Automatic vending method, apparatus and computer-readable storage medium
CN110245668B (en) * 2018-03-09 2023-06-27 腾讯科技(深圳)有限公司 Terminal information acquisition method, acquisition device and storage medium based on image recognition
CN108664897A (en) * 2018-04-18 2018-10-16 平安科技(深圳)有限公司 Bank slip recognition method, apparatus and storage medium
US20210157331A1 (en) * 2018-04-19 2021-05-27 Positec Power Tools (Suzhou) Co., Ltd Self-moving device, server, and automatic working system thereof
CN108664997A (en) * 2018-04-20 2018-10-16 西南交通大学 High iron catenary equipotential line defective mode detection method based on cascade Faster R-CNN
CN108734718B (en) * 2018-05-16 2021-04-06 北京市商汤科技开发有限公司 Processing method, device, storage medium and equipment for image segmentation
CN108897786B (en) * 2018-06-08 2021-06-08 Oppo广东移动通信有限公司 Recommendation method and device of application program, storage medium and mobile terminal
CN109035558B (en) * 2018-06-12 2020-08-25 武汉市哈哈便利科技有限公司 Commodity recognition algorithm online learning system for unmanned sales counter
CN109035579A (en) * 2018-06-29 2018-12-18 深圳和而泰数据资源与云技术有限公司 A kind of commodity recognition method, self-service machine and computer readable storage medium
CN110750667A (en) * 2018-07-05 2020-02-04 第四范式(北京)技术有限公司 Auxiliary labeling method, device, equipment and storage medium
CN109255325A (en) * 2018-09-05 2019-01-22 百度在线网络技术(北京)有限公司 Image-recognizing method and device for wearable device
CN109409423A (en) * 2018-10-15 2019-03-01 珠海格力电器股份有限公司 A kind of image-recognizing method, device, terminal and readable storage medium storing program for executing
CN111089388A (en) * 2018-10-18 2020-05-01 珠海格力电器股份有限公司 Method and system for controlling air conditioner, air conditioner and household appliance
CN109522947B (en) * 2018-10-31 2022-03-25 联想(北京)有限公司 Identification method and device
CN109409325B (en) * 2018-11-09 2022-05-31 联想(北京)有限公司 Identification method and electronic equipment
US20220004777A1 (en) * 2018-11-15 2022-01-06 Sony Group Corporation Information processing apparatus, information processing system, information processing method, and program
CN109583499B (en) * 2018-11-30 2021-04-16 河海大学常州校区 Power transmission line background object classification system based on unsupervised SDAE network
CN109783674A (en) * 2018-12-13 2019-05-21 平安普惠企业管理有限公司 Image identification method, device, system, computer equipment and storage medium
CN111444746B (en) * 2019-01-16 2024-01-30 北京亮亮视野科技有限公司 Information labeling method based on neural network model
CN109886338A (en) * 2019-02-25 2019-06-14 苏州清研精准汽车科技有限公司 A kind of intelligent automobile test image mask method, device, system and storage medium
CN111611828A (en) * 2019-02-26 2020-09-01 北京嘀嘀无限科技发展有限公司 Abnormal image recognition method and device, electronic equipment and storage medium
CN109981755A (en) * 2019-03-12 2019-07-05 深圳灵图慧视科技有限公司 Image-recognizing method, device and electronic equipment
CN109817201B (en) * 2019-03-29 2021-03-26 北京金山安全软件有限公司 Language learning method and device, electronic equipment and readable storage medium
CN110309735A (en) * 2019-06-14 2019-10-08 平安科技(深圳)有限公司 Exception detecting method, device, server and storage medium
CN112949667A (en) * 2019-12-09 2021-06-11 北京金山云网络技术有限公司 Image recognition method, system, electronic device and storage medium
CN111414946B (en) * 2020-03-12 2022-09-23 腾讯科技(深圳)有限公司 Artificial intelligence-based medical image noise data identification method and related device
CN111429512B (en) * 2020-04-22 2023-08-25 北京小马慧行科技有限公司 Image processing method and device, storage medium and processor
CN111611871B (en) * 2020-04-26 2023-11-28 深圳奇迹智慧网络有限公司 Image recognition method, apparatus, computer device, and computer-readable storage medium
US11295167B2 (en) * 2020-04-27 2022-04-05 Toshiba Global Commerce Solutions Holdings Corporation Automated image curation for machine learning deployments
CN111783775A (en) * 2020-06-30 2020-10-16 京东数字科技控股有限公司 Image acquisition method, device, equipment and computer readable storage medium
CN112584213A (en) * 2020-12-11 2021-03-30 海信视像科技股份有限公司 Display device and display method of image recognition result
CN112288883B (en) * 2020-10-30 2023-04-18 北京市商汤科技开发有限公司 Method and device for prompting operation guide information, electronic equipment and storage medium
CN112507605A (en) * 2020-11-04 2021-03-16 清华大学 Power distribution network anomaly detection method based on AnoGAN
CN112613553B (en) * 2020-12-18 2022-03-08 中电金信软件有限公司 Picture sample set generation method and device, computer equipment and storage medium
CN112597895B (en) * 2020-12-22 2024-04-26 阿波罗智联(北京)科技有限公司 Confidence determining method based on offset detection, road side equipment and cloud control platform
CN112580745A (en) * 2020-12-29 2021-03-30 北京五八信息技术有限公司 Image recognition method and device, electronic equipment and computer readable medium
US11869319B2 (en) * 2020-12-31 2024-01-09 Datalogic Usa, Inc. Fixed retail scanner with annotated video and related methods
CN113239804B (en) * 2021-05-13 2023-06-02 杭州睿胜软件有限公司 Image recognition method, readable storage medium, and image recognition system
CN113128247B (en) * 2021-05-17 2024-04-12 阳光电源股份有限公司 Image positioning identification verification method and server
CN113344055B (en) * 2021-05-28 2023-08-22 北京百度网讯科技有限公司 Image recognition method, device, electronic equipment and medium
CN113378836A (en) * 2021-06-28 2021-09-10 北京百度网讯科技有限公司 Image recognition method, apparatus, device, medium, and program product
US11960569B2 (en) * 2021-06-29 2024-04-16 7-Eleven, Inc. System and method for refining an item identification model based on feedback
CN114363206A (en) * 2021-12-28 2022-04-15 奇安信科技集团股份有限公司 Terminal asset identification method and device, computing equipment and computer storage medium
CN114998665B (en) * 2022-08-04 2022-11-01 创新奇智(广州)科技有限公司 Image category identification method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103026368A (en) 2010-07-30 2013-04-03 高通股份有限公司 Object recognition using incremental feature extraction
CN104281833A (en) 2013-07-08 2015-01-14 深圳市腾讯计算机系统有限公司 Method and device for recognizing pornographic images
US20150016679A1 (en) * 2012-01-12 2015-01-15 Panasonic Corporation Feature extraction device, feature extraction method, and feature extraction program
US20150294503A1 (en) 2014-04-14 2015-10-15 Baidu Online Network Technology (Beijing) Co., Ltd. Reality augmenting method, client device and server
US9773209B1 (en) * 2014-07-01 2017-09-26 Google Inc. Determining supervised training data including features pertaining to a class/type of physical location and time location was visited
US10242036B2 (en) * 2013-08-14 2019-03-26 Ricoh Co., Ltd. Hybrid detection recognition system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064981A (en) * 2013-01-18 2013-04-24 浪潮电子信息产业股份有限公司 Method for searching images on basis of cloud computing
CN104679863B (en) * 2015-02-28 2018-05-04 武汉烽火众智数字技术有限责任公司 It is a kind of based on deep learning to scheme to search drawing method and system
CN105095919A (en) * 2015-09-08 2015-11-25 北京百度网讯科技有限公司 Image recognition method and image recognition device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103026368A (en) 2010-07-30 2013-04-03 高通股份有限公司 Object recognition using incremental feature extraction
US20150016679A1 (en) * 2012-01-12 2015-01-15 Panasonic Corporation Feature extraction device, feature extraction method, and feature extraction program
CN104281833A (en) 2013-07-08 2015-01-14 深圳市腾讯计算机系统有限公司 Method and device for recognizing pornographic images
WO2015003606A1 (en) * 2013-07-08 2015-01-15 Tencent Technology (Shenzhen) Company Limited Method and apparatus for recognizing pornographic image
US10242036B2 (en) * 2013-08-14 2019-03-26 Ricoh Co., Ltd. Hybrid detection recognition system
US20150294503A1 (en) 2014-04-14 2015-10-15 Baidu Online Network Technology (Beijing) Co., Ltd. Reality augmenting method, client device and server
US9773209B1 (en) * 2014-07-01 2017-09-26 Google Inc. Determining supervised training data including features pertaining to a class/type of physical location and time location was visited

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170221110A1 (en) * 2016-02-01 2017-08-03 Mitchell International, Inc. Methods for improving automated damage appraisal and devices thereof

Also Published As

Publication number Publication date
US20180204562A1 (en) 2018-07-19
WO2017041366A1 (en) 2017-03-16
CN105095919A (en) 2015-11-25

Similar Documents

Publication Publication Date Title
US10796685B2 (en) Method and device for image recognition
US11978245B2 (en) Method and apparatus for generating image
US10650492B2 (en) Method and apparatus for generating image
US11163991B2 (en) Method and apparatus for detecting body
CN108830235B (en) Method and apparatus for generating information
CN110503703B (en) Method and apparatus for generating image
CN109993150B (en) Method and device for identifying age
US11436863B2 (en) Method and apparatus for outputting data
US11270099B2 (en) Method and apparatus for generating facial feature
CN107393541B (en) Information verification method and device
US11670015B2 (en) Method and apparatus for generating video
CN107609506B (en) Method and apparatus for generating image
CN109829432B (en) Method and apparatus for generating information
CN109784304B (en) Method and apparatus for labeling dental images
CN109214501B (en) Method and apparatus for identifying information
CN110059623B (en) Method and apparatus for generating information
CN109583389B (en) Drawing recognition method and device
US11210563B2 (en) Method and apparatus for processing image
CN110472558B (en) Image processing method and device
US11659181B2 (en) Method and apparatus for determining region of interest
CN110555334B (en) Face feature determination method and device, storage medium and electronic equipment
CN110008926B (en) Method and device for identifying age
CN110046571B (en) Method and device for identifying age
CN109977905B (en) Method and apparatus for processing fundus images
CN110335237B (en) Method and device for generating model and method and device for recognizing image

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: ADVISORY ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

AS Assignment

Owner name: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD., CHINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GONG, LONG;ZHANG, YANFU;GU, JIAWEI;SIGNING DATES FROM 20200707 TO 20200731;REEL/FRAME:053678/0215

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4