CN111222397A - Drawing book identification method and device and robot - Google Patents

Drawing book identification method and device and robot Download PDF

Info

Publication number
CN111222397A
CN111222397A CN201911026013.5A CN201911026013A CN111222397A CN 111222397 A CN111222397 A CN 111222397A CN 201911026013 A CN201911026013 A CN 201911026013A CN 111222397 A CN111222397 A CN 111222397A
Authority
CN
China
Prior art keywords
image
feature point
sample
cover
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911026013.5A
Other languages
Chinese (zh)
Other versions
CN111222397B (en
Inventor
顾景
李扬
王玥
刘傲
程骏
庞建新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ubtech Robotics Corp
Original Assignee
Ubtech Robotics Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ubtech Robotics Corp filed Critical Ubtech Robotics Corp
Priority to CN201911026013.5A priority Critical patent/CN111222397B/en
Publication of CN111222397A publication Critical patent/CN111222397A/en
Application granted granted Critical
Publication of CN111222397B publication Critical patent/CN111222397B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Abstract

The application provides a picture book identification method, a device and a robot, which are applicable to the technical field of data processing, wherein the method comprises the following steps: acquiring a cover image of the picture book and sending the cover image to a server; if an inner page feature data set and an audio set sent by a server are received, acquiring an inner page image of a picture book, wherein the inner page feature data set and the audio set are obtained by carrying out picture book recognition on a cover image to obtain a picture book identifier and calling the inner page feature data set and the audio set in a local storage according to the obtained picture book identifier after the server receives the cover image; performing page number identification on the inner page image based on the inner page characteristic data set to obtain the page number of the inner page image; and searching audio corresponding to the page number from the audio set, and outputting the audio. According to the method and the device, the process of identifying the inner page does not need to be carried out through excessive network interaction, and influences such as network delay do not exist, so that the efficiency of drawing book identification is higher, and the reliability is higher.

Description

Drawing book identification method and device and robot
Technical Field
The application belongs to the technical field of image recognition, and particularly relates to a picture book recognition method and a robot.
Background
The picture book is a book mainly made of painting and attached with a small number of characters. The picture book not only can be used for telling stories and learning knowledge, but also can comprehensively help children build spirits and cultivate multivariate intelligence.
The picture book identification is an image retrieval technology, so that which picture book is currently turned by a user and which page content in the picture book can be accurately identified, and the accuracy and reliability of the following broadcasting of the picture book content are ensured. Because the robot is limited by the influence of storage, memory and computing capacity of the embedded equipment of the local robot, the related technology is to perform identification of the book cover and the inner page on the cloud server, and the robot plays the audio after receiving the audio corresponding to the inner page sent by the server, but the identification efficiency, accuracy and reliability are low.
Disclosure of Invention
In view of this, the embodiment of the present application provides a method and a robot for recognizing a sketch, which can solve the problem of low efficiency and reliability of the sketch recognition.
A first aspect of an embodiment of the present application provides a picture book identification method, including:
acquiring a cover image of the picture book and sending the cover image to a server;
if an inner page feature data set and an audio set sent by the server are received, acquiring an inner page image of the picture book, wherein the inner page feature data set and the audio set are obtained by carrying out picture book recognition on the front cover image to obtain a picture book identifier after the server receives the front cover image, and calling the inner page feature data set and the audio set in a local storage according to the obtained picture book identifier;
performing page number identification on the inner page image based on the inner page feature data set to obtain the page number of the inner page image;
and searching the audio corresponding to the page number from the audio set, and outputting the audio.
In a first possible implementation manner of the first aspect, the acquiring the inner page image of the sketch includes:
performing page turning identification on the picture book;
and if the picture book is turned, acquiring an inner page image of the picture book after turning the page.
Based on the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the performing page turning recognition on the sketch includes:
carrying out real-time video acquisition on the picture book, and carrying out image comparison of continuous frames on the acquired video;
if the difference degree of two continuous frames of images is greater than a first difference threshold value, setting the state of the picture book to be a first state;
when the state of the picture book is a first state, carrying out image comparison of continuous frames on the video, and detecting the image quality of the latest frame of image in the video;
if the difference degree of the adjacent frame images in the continuous n frame images is smaller than a second difference threshold value and the image quality is larger than a quality threshold value, updating the state of the picture book to be in a second state and judging that the picture book is turned over, wherein n is a positive integer larger than 2, and the first difference threshold value is larger than or equal to the second difference threshold value.
Based on the first and second possible implementation manners of the first aspect, in a third possible implementation manner of the first aspect, the intra-page feature data set includes a plurality of sets of sample feature point data obtained by respectively performing perspective transformation on sample images of respective intra pages in the picture book and performing feature point extraction on the sample images of the respective intra pages and images obtained by the perspective transformation, and each set of sample feature point data corresponds to a page number of the intra page, where each set of sample feature point data includes a plurality of sample feature points, and each intra page corresponds to a plurality of sets of sample feature point data,
the page number recognition of the inner page image based on the inner page feature data set comprises:
extracting feature points of the inner page image to obtain a group of inner page feature point data corresponding to the inner page image, and respectively performing feature point matching on each group of sample feature point data by using the inner page feature point data;
carrying out abnormal matching point elimination on successfully matched sample feature points in the sample feature point data, and counting the number of successfully matched sample feature points in each group of sample feature point data after the abnormal matching point elimination operation is carried out;
and screening the successfully matched sample feature point data according to the number of the successfully matched sample feature points, and taking the page number corresponding to the successfully matched sample feature point data as the page number of the inner page image.
A second aspect of the embodiments of the present application provides a picture book identification method, including:
the robot acquires a cover image of the picture book and sends the cover image to the server;
the server performs picture book recognition on the received cover image to obtain a picture book identification corresponding to the cover image;
the server calls an inner page feature data set and an audio set corresponding to the picture book identification from a local storage and sends the called inner page feature data set and the called audio set to the robot;
if the inner page feature data set and the audio set are received, the robot acquires an inner page image of the picture book;
the robot carries out page number identification on the inner page image based on the inner page feature data set to obtain the page number of the inner page image;
and the robot searches the audio corresponding to the page number from the audio set and outputs the audio.
In a first possible implementation manner of the second aspect, the performing, by the server, book drawing recognition on the received cover image to obtain a book drawing identifier corresponding to the cover image includes:
extracting feature points of the cover image to obtain a group of cover feature point data corresponding to the cover image;
performing feature point matching on each group of cover feature point data in a cover feature point data set by using the cover feature point data, wherein the cover feature point data set comprises a plurality of groups of sample feature point data, each group of sample feature point data corresponds to a book drawing identifier of a cover, and the plurality of groups of sample feature point data are obtained by respectively performing perspective transformation on first sample images of a plurality of book drawing covers and performing feature point extraction on the first sample images of the covers and second sample images obtained through the perspective transformation by the server;
and if the successfully matched cover feature point data exists, taking the picture book identifier corresponding to the successfully matched cover feature point data as the picture book identifier of the cover image.
Based on the first possible implementation manner of the second aspect, in a second possible implementation manner of the second aspect, the performing feature point matching on each group of cover feature point data in the cover feature point data set by using the cover feature point data includes:
acquiring the first sample image and the second sample image of a plurality of drawing book covers, performing image retrieval on the first sample image and the second sample image by using the cover images, and taking the retrieved first sample image and the retrieved second sample image as target images;
extracting sample feature point data of each target image from the cover feature point data set, and taking the extracted sample feature point data as target feature point data;
performing feature point matching on each target feature point data by using the cover feature point data;
carrying out abnormal matching point elimination on successfully matched feature points in the target feature point data, and counting the number of feature points successfully matched with each target feature point data after the abnormal matching point elimination operation;
and if the number of the feature points successfully matched with the maximum is larger than a first number threshold, taking the target feature point data corresponding to the number of the feature points successfully matched with the maximum as cover feature point data successfully matched with the cover feature point data.
Based on the first and second possible implementation manners of the second aspect, in a third possible implementation manner of the second aspect, the process of constructing the cover feature point data set by the server includes:
acquiring the first sample images of a plurality of drawing book covers and drawing book identifications corresponding to the first sample images;
respectively carrying out perspective transformation on each first sample image to obtain second sample images respectively corresponding to each first sample image, wherein each first sample image corresponds to a plurality of second sample images with different perspective transformation angles;
mapping the picture mark corresponding to the first sample image to the picture mark of the second sample image corresponding to the first sample image;
respectively extracting feature points of each first sample image and each second sample image to obtain a plurality of groups of cover feature point data which correspond to each other one by one;
and performing associated storage on each group of the cover feature point data and the corresponding book drawing identifier of the first sample image or the second sample image to obtain the cover feature point data set.
In a fourth possible implementation manner of the second aspect, the acquiring, by the robot, the inner page image of the sketch includes:
the robot carries out page turning identification on the picture book;
and if the picture book is turned, the robot acquires the inner page image of the picture book after turning pages.
Based on the fourth possible implementation manner of the second aspect, in a fifth possible implementation manner of the second aspect, the performing, by the robot, page-turning recognition on the sketch includes:
the robot carries out real-time video acquisition on the picture book and carries out image comparison of continuous frames on the acquired video;
if the difference degree of two continuous frames of images is larger than a first difference threshold value, the robot sets the state of the picture book to be a first state;
when the state of the picture book is a first state, the robot carries out image comparison of continuous frames on the video and detects the image quality of the latest frame of image in the video;
if the difference degrees of the adjacent frame images in the n continuous frame images are smaller than a second difference threshold value and the image quality is larger than a quality threshold value, the robot updates the state of the drawing to be a second state and judges that the drawing is turned, wherein n is a positive integer larger than 2, and the first difference threshold value is larger than or equal to the second difference threshold value.
Based on the fourth and fifth possible implementation manners of the second aspect, in a sixth possible implementation manner of the second aspect, the intra-page feature data set includes a plurality of sets of sample feature point data obtained by respectively performing perspective transformation on the sample images of the respective intra pages in the picture book and performing feature point extraction on the sample images of the respective intra pages and the images obtained by the perspective transformation, and each set of sample feature point data corresponds to a page number of the intra page, where each set of sample feature point data includes a plurality of sample feature points, and each intra page corresponds to a plurality of sets of sample feature point data,
the robot performs page number recognition on the inner page image based on the inner page feature data set, and the method comprises the following steps:
the robot extracts the characteristic points of the inner page image to obtain a group of inner page characteristic point data corresponding to the inner page image, and the inner page characteristic point data is utilized to respectively perform characteristic point matching on each group of sample characteristic point data;
the robot carries out abnormal matching point elimination on successfully matched sample feature points in the sample feature point data, and counts the number of successfully matched sample feature points in each group of sample feature point data after the abnormal matching point elimination operation;
and the robot screens out the successfully matched sample feature point data according to the number of the successfully matched sample feature points, and takes the page number corresponding to the successfully matched sample feature point data as the page number of the inner page image.
A third aspect of the embodiments of the present application provides a picture book recognition apparatus, including:
the cover transmission module is used for acquiring a cover image of the picture book and sending the cover image to the server;
the inner page obtaining module is used for obtaining an inner page image of the picture book if an inner page feature data set and an audio set sent by the server are received, wherein the inner page feature data set and the audio set are obtained by the server performing picture book recognition on the cover image after receiving the cover image to obtain a picture book mark and calling the inner page feature data set and the audio set in a local storage according to the obtained picture book mark;
the page number identification module is used for carrying out page number identification on the inner page image based on the inner page feature data set to obtain the page number of the inner page image;
and the audio searching module is used for searching the audio corresponding to the page number from the audio set and outputting the audio.
A fourth aspect of embodiments of the present application provides a robot, which includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor implements the steps of the picture recognition method according to any one of the first aspect.
A fifth aspect of an embodiment of the present application provides a picture book recognition system including: a robot and a server;
the robot is used for acquiring a cover image of the picture book and sending the cover image to the server;
the server is used for carrying out picture book recognition on the received cover image to obtain a picture book identification corresponding to the cover image;
the server is further used for calling the inner page feature data set and the audio set corresponding to the picture book identification in a local storage, and sending the called inner page feature data set and the called audio set to the robot;
the robot is further used for acquiring the inner page image of the picture book if the inner page feature data set and the audio set are received;
the robot is further used for carrying out page number identification on the inner page image based on the inner page feature data set to obtain the page number of the inner page image;
the robot is further used for searching the audio corresponding to the page number from the audio set and outputting the audio.
A sixth aspect of embodiments of the present application provides a computer-readable storage medium, comprising: there is stored a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the method for picture recognition according to any of the above first aspects.
A seventh aspect of embodiments of the present application provides a computer program product, which when run on a robot, causes the robot to execute the picture recognition method according to any one of the first aspect.
Compared with the prior art, the embodiment of the application has the advantages that: the inner page feature data set and the audio set of each picture book are stored in the server, the inner page feature data set and the audio set of the current picture book are issued after the picture book needing to be identified is determined, the local terminal device does not need to store excessive picture book data in advance, the storage pressure of the local terminal device is relieved, meanwhile, the local terminal device conducts page code identification on the current inner page image by using the inner page feature data set, the audio corresponding to the page code is selected and output, the process of inner page identification does not need to be influenced by excessive network interaction, network delay and the like, and the efficiency of picture book identification is higher and the reliability is higher.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flow chart illustrating an implementation of a picture recognition method according to an embodiment of the present application;
fig. 2 is a schematic flow chart illustrating an implementation of the picture recognition method according to the second embodiment of the present application;
fig. 3 is a schematic flow chart illustrating an implementation of a picture recognition method according to a third embodiment of the present application;
fig. 4 is a schematic flow chart illustrating an implementation of the picture book identification method according to the fourth embodiment of the present application;
fig. 5 is a schematic flow chart illustrating an implementation of the picture book identification method according to the fifth embodiment of the present application;
fig. 6 is a schematic flow chart illustrating an implementation of a picture recognition method according to a sixth embodiment of the present application;
fig. 7 is a schematic flow chart illustrating an implementation of a picture recognition method according to a seventh embodiment of the present application;
fig. 8 is a schematic flow chart illustrating an implementation of a picture recognition method according to an eighth embodiment of the present application;
fig. 9 is a schematic structural diagram of a picture recognition apparatus according to a twelfth embodiment of the present application;
FIG. 10 is a system interaction diagram of a picture recognition system provided in accordance with a thirteenth embodiment of the present application;
fig. 11 is a schematic view of a robot provided in a fourteenth embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.
In order to facilitate understanding of the application, the embodiment of the application is briefly described here, because in the related art, the front cover and the content of the picture book are identified at the cloud server, and the audio corresponding to the inner page is issued, the local terminal device only performs the work of picture collection and audio receiving and broadcasting, so that the picture book can be identified, on one hand, the requirement on the deployment cost of the cloud server is extremely high because the cloud concurrency is very large, on the other hand, because the cloud identification relates to a large amount of picture and audio transmission operation, the interference of the network environment is extremely easy to be received, and the identification speed of the picture book is low, and the reliability is low.
In order to improve the recognition speed and reliability of the picture books, in the embodiment of the application, the inner page feature data set and the audio set of each picture book are stored in the server in advance, and the inner page feature data set and the audio set of the current picture book are issued after the picture book required to be recognized is determined, so that the local terminal device does not need to store excessive picture book data in advance, the storage pressure of the local terminal device is relieved, meanwhile, the local terminal device performs page number recognition on the current inner page image by using the inner page feature data set, selects and outputs audio corresponding to the page number, the process of inner page recognition does not need to be performed through excessive network interaction, and the effects of network delay and the like do not exist, so that the efficiency of picture book recognition is higher and the reliability is stronger.
Meanwhile, in the first to fourth embodiments of the present application, the execution subject of the text recognition method is a local terminal device, wherein the specific type or hardware form of the terminal device is not limited herein, and can be selected or designed by a technician according to the actual application requirements, including but not limited to robots and mobile terminals, and the following description of the first to fourth embodiments of the present application takes the execution subject as a robot example.
The embodiments of the present application are detailed as follows:
fig. 1 shows a flowchart of an implementation of the picture book identification method according to an embodiment of the present application, which is detailed as follows:
s101, acquiring a cover image of the picture book and sending the cover image to a server.
In the embodiment of the application, the robot acquires the cover of the picture book to obtain the required cover image, wherein the specific acquisition mode is not limited here, and the acquisition mode can be direct photographing or video photographing and acquisition mode of selecting the cover image from video frame images. It should be understood that, because the environment of the actual book drawing identification is difficult to predict, how the actual quality of the cover image is actually guaranteed when the cover image is obtained cannot be guaranteed, for example, there may be an image that only includes the cover of the book drawing, or an image that includes the cover of the book drawing and also includes the environmental noise of the non-cover, for example, when the cover image of the book drawing placed on the desktop is collected, the obtained cover image has both the cover of the book drawing and the desktop area, and therefore, in the embodiment of the present application, the cover image shall refer to an image that includes the cover of the book drawing, rather than an image that only includes the cover of the book drawing.
In order to accurately obtain the cover image of the drawing book, as an optional embodiment of the present application, before S101, the robot may send a specific prompt message to the user to prompt the user to place the cover of the drawing book in the robot image acquisition area, where the type of the prompt message is not limited herein and may be set by a technician according to actual needs, for example, the prompt message may be a voice message, and at this time, a voice prompt "please place the cover of the drawing book in the robot visible area" may be sent to the user.
In the embodiment of the application, a cover image is selected to realize the identification of the picture books, specifically, each picture book is subjected to unique identification in advance to obtain a unique picture book identification corresponding to each picture book, and the collected cover image of each picture book is uniquely associated with the picture book identification, so that unique mapping from the cover image to the picture book is realized. Therefore, after acquiring the required cover image, in the embodiment of the application, the robot sends the cover image to the server, so that the server performs subsequent operations such as drawing recognition. The data format of the text label is not limited herein, and can be set by a technician, for example, the text label may be a string of numbers, or a string of character string labels.
S102, if the inner page feature data set and the audio set sent by the server are received, obtaining an inner page image of the picture book, wherein the inner page feature data set and the audio set are obtained by the steps that the server conducts picture book recognition on the cover image after receiving the cover image to obtain a picture book mark, and calling of the inner page feature data set and the audio set is conducted in local storage according to the obtained picture book mark.
In the embodiment of the present application, all the inner page feature data corresponding to a single picture book are stored in a local storage of the server in a form of a single data set in advance, that is, the inner page feature data sets of each picture book are stored in the server in advance. Every audio frequency is concentrated and is contained the audio frequency of reporting of each interior page in drawing the book to single, in this application embodiment, also can save each audio frequency set of drawing the book in advance in the server, the server still can be with each interior page feature data in the interior page feature data set, each audio frequency is concentrated to the audio frequency, and carry out the associative storage with corresponding interior page number information simultaneously, and carry out the associative storage in local storage with every interior page feature data set, audio frequency set and the drawing this sign that corresponds the book of drawing, in order to realize follow-up data extraction and the discernment of interior page number to drawing the book.
In this application embodiment, the server can draw this discernment to the cover image after receiving the cover image that the robot sent to determine whether the corresponding this that draws of cover image is for drawing the book that can discern, and determine the corresponding this sign that draws, still can be based on the this sign that draws that determines simultaneously, comes the screening of internal page feature data set and audio frequency set, and will obtain the current internal page feature data set and the audio frequency set that draw the book and send to the robot.
As an embodiment of the present application, after sending the cover image to the server, the robot may actively determine in real time whether the inside page feature data set and the audio set are received, at this time, before S102, the robot further includes an operation step of detecting and determining, or passively trigger the operation of S102 after receiving the inside page feature data set and the audio set actively sent by the server without performing real-time determination, and a specific operation manner may be selected and set by a technician according to actual needs, which is not limited herein.
After receiving the inner page feature data set and the audio set, the robot indicates that the picture book corresponding to the current cover image can be identified and broadcasted, and therefore, the inner page image of the picture book can be further collected, where reference is made to the above description of the cover image, the inner page image in the embodiment of the present application also refers to an image including the inner page of the picture book, and does not refer to an image only including the inner page of the picture book, and meanwhile, a collection manner of the inner page image, also refers to the above description of the collection manner of the cover image, and is not described herein again, or reference may be made to the descriptions of the second to the fourth embodiments of the present application. Meanwhile, in the embodiment of the present application, it is actually required for the robot to determine whether two data sets, i.e., the inner page feature data set and the audio set, are received, and there is no specific requirement on the receiving order of the two data sets, that is, one data set is received first and then the other data set is received, or the two data sets are received simultaneously, that is, as long as both data sets are received, the operation of S102 in the embodiment of the present application may be triggered.
S103, carrying out page number identification on the inner page image based on the inner page feature data set to obtain the page number of the inner page image.
After the inner page image is obtained, the embodiment of the application further performs feature analysis on the inner page image, performs matching analysis on the inner page feature data in the inner page feature data set based on the feature data of the inner page image, and determines the page number corresponding to the inner page feature data successfully matched with the inner page feature data, thereby identifying the page number corresponding to the inner page image. The specific matching method is not limited herein, and can be selected or set by a technician, including but not limited to, comparing the data of each inner page feature data in the inner page feature data set one by one, and screening out the inner page feature data with the highest similarity as a matching success result, which also can refer to the fourth embodiment of the present application.
S104, searching the audio corresponding to the page number from the audio set, and outputting the audio.
After determining the page number of the current inner page, the embodiment of the application can further search the audio corresponding to the page number from the audio set, thereby realizing accurate search of the current inner page audio, and finally outputting the searched audio, and realizing real-time accurate report of the picture book. The output mode is not limited here, and may be determined according to the requirements of the actual scene, for example, the robot may perform audio playing itself, or output audio to an audio playing device connected to the robot, for example, output audio to a sound box, so as to realize playing of the audio.
In order to improve the recognition speed and reliability of the picture books, in the embodiment of the application, the inner page feature data set and the audio set of each picture book are stored in the server in advance, and the inner page feature data set and the audio set of the current picture book are issued after the picture book required to be recognized is determined, so that the local terminal device does not need to store excessive picture book data in advance, the storage pressure of the local terminal device is relieved, meanwhile, the local terminal device performs page number recognition on the current inner page image by using the inner page feature data set, selects and outputs audio corresponding to the page number, the process of inner page recognition does not need to be performed through excessive network interaction, and the effects of network delay and the like do not exist, so that the efficiency of picture book recognition is higher and the reliability is stronger.
As a specific implementation manner of obtaining the inside page image in the first embodiment of the present application, considering that the computing resources of the robot are extremely limited, and the processing and identification workload for the inside page image is large, especially when the inside page identification is performed on the video frame image, the computation workload is undoubtedly large for the robot, so as to reduce the computation load on the robot for obtaining and identifying the inside page image and improve the efficiency of identifying the inside page image, as shown in fig. 2, the second embodiment of the present application includes:
s201, page turning recognition is carried out on the picture book.
In the embodiment of the application, after receiving an inner page feature data set and an audio set sent by a server, video acquisition and page turning identification are performed on a picture book, so as to judge whether a user turns a page of the picture book, when the user turns the page for the first time, the picture book is switched from a front cover to an inner page, and for non-first-time page turning, the picture book is switched from one inner page to another inner page, so that a new inner page obtained after the picture book is turned, namely the inner page needing real-time identification and broadcasting in the embodiment of the application, namely through page turning identification, the embodiment of the application can effectively judge whether a current image is an inner page image needing identification in time, and each sampled image cannot be used as the inner page image.
The specific page turning recognition method, which is not limited herein, may be set by a technician according to actual requirements, including but not limited to, for example, recognizing the motion of a hand of a user, or recognizing the real-time page change state of a picture book, and the like, and refer to the third embodiment of the present application.
S202, if the picture book is turned over, obtaining an inner page image of the turned picture book.
When the picture book is turned, the user is switching a new inner page, namely, the new content needs to be prepared for identification and broadcast, and when the translation is finished, the new inner page is switched to the picture book, and at the moment, the inner page image of the picture book after the page turning is finished can be obtained by the embodiment of the application, so that the real-time reliability of the identified inner page image is ensured.
In the embodiment of the application, the page turning recognition is performed on the picture book, and only the picture of the picture book acquired after the page turning is completed is taken as the inner page picture for subsequent processing, so that the huge processing load caused by the fact that the inner page picture processing recognition is performed on all the acquired pictures is avoided, and the acquisition and recognition of the inner page picture become more efficient.
As a specific implementation manner of page turning identification for the picture book in the second embodiment of the present application, as shown in fig. 3, the page turning identification includes:
s301, carrying out real-time video acquisition on the picture book, and carrying out image comparison of continuous frames on the acquired video.
S302, if the difference degree of the two continuous frames of images is greater than the first difference threshold value, the state of the picture book is set to be the first state.
In consideration of the fact that the content of the real-time image of the picture book, which can be acquired by the robot, changes greatly in the page turning process, a mode of performing real-time video acquisition on the picture book and performing real-time comparison on continuous frame images of the video is adopted in the embodiment of the application to identify whether the picture book is in the page turning state (namely, the first state in the embodiment of the application). When the difference between two consecutive frames of images is large, it indicates that the picture book is not still placed, i.e. in the page turning process, and at this time, the embodiment of the present application may directly determine that the state of the picture book is the first state and modify the state. The image comparison method and the specific size of the first difference threshold are not limited herein, and may be set by a technician according to actual requirements, for example, the image pixel comparison method may be adopted for comparison, and the first difference threshold may be set to be 30%.
S303, when the state of the picture book is the first state, comparing the images of the continuous frames of the video, and detecting the image quality of the latest frame of image in the video.
S304, if the difference degrees of the adjacent frame images in the continuous n frame images are smaller than the second difference threshold value and the image quality is larger than the quality threshold value, updating the state of the picture book to be the second state and judging that the picture book is turned over, wherein n is a positive integer larger than 2, and the first difference threshold value is larger than or equal to the second difference threshold value.
Considering that page turning is a continuous behavior (even if the page turning time is short, a large number of frame images are generated in the time for the video sampling frequency), even if the picture book is monitored to be in the page turning process, the effective collection of the inner page images cannot be carried out. In order to ensure effective acquisition of the inner page image, the embodiment of the application also can start to identify whether the page turning of the picture book is finished or not when the picture book is identified to be in the page turning process.
In consideration of the fact that in an actual situation, when the page turning of the drawing is finished and the drawing is in a static state (i.e., the second state in the embodiment of the present application) in which the page is not turned, the image quality of the drawing which can be acquired by the robot can be greatly improved, and the continuously acquired images are substantially consistent, therefore, in the embodiment of the present application, when the drawing is in the page turning, the continuously acquired frame images are compared in real time, and the image quality of the latest frame image is identified. The method for calculating the image quality is not limited herein, and can be set by a technician according to actual requirements, including but not limited to, analyzing the frame sharpness of the frame image, and the like. The specific value of n, the specific value of the second difference threshold, and the specific value of the quality threshold can be set by a skilled person according to actual requirements, and are not limited herein, for example, n may be set to 5 or 10, but in order to ensure accurate distinguishing and identifying between page turning and stillness, in the embodiment of the present application, the first difference threshold should be greater than or equal to the second difference threshold.
Meanwhile, considering that a user may have more than one page turning operation in the process of reading the picture book, each operation is to update the real-time inner page, the robot needs to recognize the latest inner page in time and respond, and therefore independent identification of each page turning is very important.
As a specific way of acquiring an inner page image in the second embodiment of the present application, on the basis of the third embodiment of the present application, the method includes:
and directly taking the latest frame image with the image quality larger than the quality threshold value as the inner page image of the picture book.
At this time, the required inner page image can be directly acquired.
As another specific way of acquiring an inner page image in the second embodiment of the present application, on the basis of the third embodiment of the present application, the method includes:
and shooting the image of the page-turned picture book to obtain an inner page image, wherein the pixel shot by the image is higher than the pixel collected by the video.
Because the data processing amount of video acquisition and frame image comparison is large, but during actual comparison, only the difference degree between frame images and the quality of the frame images need to be known, in order to reduce the calculation load of video acquisition and processing, the embodiment of the application adopts a lower video pixel acquisition mode to acquire and process the video, and only when the current internal page image needing to be shot is determined, the internal page image with high pixels is shot, so that the quality of the internal page image is ensured.
As a specific implementation manner of performing inner page image recognition in the first embodiment of the application, in consideration of actual situations, light, angles, distances, and the like of the robot shooting a picture book each time cannot be predicted, that is, there may be a great difference between an actually collected inner page image and a standard inner page image, and although effective compensation for shooting light and distances can be achieved by adjusting brightness and size of the image, on one hand, adjustment processing of the image is required, and thus a workload is large, and on the other hand, accurate recognition cannot be effectively achieved for different shooting angles.
In order to realize accurate identification of an inner page image in various shooting environments, in the embodiment of the application, the inner page feature data set includes a plurality of sets of sample feature point data obtained by respectively performing perspective transformation on a sample image of each inner page in a picture book and performing feature point extraction on the sample image of each inner page and an image obtained by the perspective transformation, and each set of sample feature point data corresponds to a page number of the inner page, wherein each set of sample feature point data includes a plurality of sample feature points, and each inner page corresponds to a plurality of sets of sample feature point data.
Specifically, in the embodiment of the present application, a sample image of an inner page (the image may be an image obtained by shooting in a better shooting environment) is collected in advance, perspective transformation is performed on the sample image, so as to obtain images which can be obtained by shooting the inner page at various angles, feature points are extracted from the sample image and corresponding perspective transformation, and since the feature points have a characteristic of being non-deformable due to scale rotation, adjustment of brightness and size of the image is not required to be performed on matching of the images, and finally all extracted feature point data sets are combined into an inner page feature data set in the embodiment of the present application, so that a robot in the embodiment of the present application can solve the influence of environmental factors such as the shot light, angle, distance and the like on matching of the inner page images only by using the inner page feature data set, thereby ensuring accurate and effective identification of the inner page images, on the basis of the first to third embodiments of the present application, as shown in fig. 4, the fourth embodiment of the present application includes:
s401, extracting feature points of the internal page image to obtain a group of internal page feature point data corresponding to the internal page image, and respectively performing feature point matching on each group of sample feature point data by using the internal page feature point data.
As can be seen from the above description, in the embodiment of the present application, the matching of the feature point data is performed based on the inner page feature data set, so that the accurate matching of the inner page image can be directly achieved. The method for extracting feature points is not limited in the embodiments of the present application, and may be selected or set by a technician according to actual needs, including but not limited to Harris, SIFT, SURF, and FAST, for example.
Meanwhile, the embodiment of the present application also does not limit the specific feature point matching method, and a technician may select or set the feature point data according to actual requirements, for example, in order to save internal computing resources of the robot, a nearest neighbor algorithm may be used to perform matching of the feature point data, at this time, it is necessary to divide feature point data of each group into K groups in every other page in an inner page feature point set, place feature point data of adjacent inner pages in different sets (for example, K is 2, that is, place odd pages such as page 1, page 3, and … in one set, and place page 2, page 4, and page … in each set), and train and store nearest neighbor search trees for each set. At this time, only the nearest matching point needs to be searched in the K nearest neighbor trees based on the feature point during matching,
s402, removing abnormal matching points of the successfully matched sample feature points in the sample feature point data, and counting the number of the successfully matched sample feature points in each group of sample feature point data after the abnormal matching point removing operation is performed.
In consideration of different feature point extraction methods and matching methods, certain matching errors may exist, so that the embodiment of the application may further perform anomaly identification on feature points successfully matched, that is, find out and delete wrong matching points therein, so as to ensure accuracy and reliability of matching, wherein a specific anomaly matching point finding method is not limited here, and can be selected or set by a technician according to the practice, including but not limited to RANSAC.
And S403, screening successfully matched sample feature point data according to the number of successfully matched sample feature points, and taking the page number corresponding to the successfully matched sample feature point data as the page number of the inner page image.
In consideration of the actual situation, it is also possible that the sample feature point data after page turning is not an inner page, for example, a user directly closes the picture book or takes away the picture book, and at this time, if the page number corresponding to the sample feature point data with the largest number of successfully matched sample feature points is directly identified as the current inner page number, erroneous identification of the inner page may be caused, so that the embodiment of the present application may further determine whether the number of the sample feature points with the largest number of successfully matched sample feature points reaches a certain number threshold, that is, whether the similarity with the inner page image is high enough, and determine the sample feature point data corresponding to the successfully matched sample feature point number as successfully matched data only when the number of the sample feature points reaches a certain number threshold, and perform page number identification. The specific value of the quantity threshold value can be set by a technician according to the requirement.
As an optional embodiment of the present application, on the basis of the fourth embodiment of the present application, if the number of the sample feature points that the maximum matching is successful does not reach a certain number threshold, the robot sends a prompt to notify the user that the page inside the picture is to be placed in the visible area of the robot.
Fig. 5 shows a flowchart of an implementation of the picture book identification method according to an embodiment of the present application, which is detailed as follows:
s501, the robot acquires a cover page image of the picture book and sends the cover page image to a server.
S502, the server conducts drawing recognition on the received cover image to obtain a drawing identification corresponding to the cover image.
S503, the server calls the inner page feature data set and the audio set corresponding to the picture book identification from the local storage, and sends the called inner page feature data set and the called audio set to the robot.
S504, if the inner page feature data set and the audio set are received, the robot acquires an inner page image of the picture book.
And S505, the robot identifies the page number of the inner page image based on the inner page feature data set to obtain the page number of the inner page image.
S506, the robot searches the audio corresponding to the page number from the audio set and outputs the audio.
In the embodiment of the present application, the operation and principle executed by the robot end are the same as those in the first embodiment of the present application, and therefore, the description of the principle of the robot operation is omitted here for brevity, and reference may be made to the description of the first embodiment of the present application.
In this embodiment of the application, the server is responsible for recognizing cover images of the robot to obtain corresponding book drawing identifiers, and is also responsible for searching and sending the inner page feature data sets and the audio sets corresponding to the book drawing identifiers to the robot, wherein the principles of operations such as searching and the like, related concepts such as the book drawing identifiers, the inner page feature data sets, and the audio sets, are the same as those in S102 in the first embodiment of the application, and therefore, no further description is given here, and reference may be made to the description in the first embodiment of the application.
In order to improve the recognition speed and reliability of the picture books, in the embodiment of the application, the inner page feature data set and the audio set of each picture book are stored in the server in advance, and the inner page feature data set and the audio set of the current picture book are issued to the robot after the picture book required to be recognized is determined, so that the robot does not need to store excessive picture book data in advance, the pressure of the robot is relieved, meanwhile, the robot performs page number recognition on the current inner page image by using the inner page feature data set, selects and outputs audio corresponding to the page number, the process of inner page recognition does not need to be performed through excessive network interaction, no influences such as network delay exist, and the efficiency of picture book recognition is higher and the reliability is higher.
As a specific implementation manner of performing the drawing recognition on the cover image in the fifth embodiment of the present application, as shown in fig. 6, the sixth embodiment of the present application includes:
s601, extracting feature points of the cover image to obtain a group of cover feature point data corresponding to the cover image.
S602, performing feature point matching on each group of cover feature point data in a cover feature point data set by using the cover feature point data, wherein the cover feature point data set comprises a plurality of groups of sample feature point data, each group of sample feature point data corresponds to a book drawing identifier of a cover, and the plurality of groups of sample feature point data are obtained by respectively performing perspective transformation on first sample images of a plurality of book drawing covers and performing feature point extraction on the first sample images of the respective book drawing covers and second sample images obtained through the perspective transformation.
And S603, if the successfully matched front cover feature point data exists, taking the picture book identifier corresponding to the successfully matched front cover feature point data as the picture book identifier of the front cover image.
The principle of processing and matching the cover image in the sixth embodiment of the present application is basically the same as the principle of processing and matching the inner page image in the fourth embodiment of the present application, and therefore, details are not repeated here, and reference may be made to the related description of the fourth embodiment of the present application. (i.e., the inner page image in the fourth embodiment of the present application is replaced by the cover image, and the related operations in the sixth embodiment of the present application can be realized)
As a specific implementation manner of performing feature point matching on each group of cover feature point data in the cover feature point data set in the sixth embodiment of the present application, it is considered that the number of identifiable drawings may be very large, at this time, the pressure on cover image matching is very large, if the cover feature point data are directly matched one by one, the calculation amount is very large, and the calculation efficiency is very low.
Therefore, in order to improve the efficiency of matching the cover feature point data, as shown in fig. 7, a seventh embodiment of the present application includes:
s701, acquiring a plurality of first sample images and second sample images of the drawn book cover, performing image retrieval on the first sample images and the second sample images by using the cover image, and taking the retrieved first sample images and second sample images as target images.
In the embodiment of the application, in order to improve the matching efficiency, before the feature point data matching is performed, image retrieval and screening are performed first, so that n sample images with higher similarity to a cover image are screened out, rough identification of a picture book is realized, and then feature point data of the sample images are compared and analyzed one by one to perform accurate secondary screening and searching. The specific image retrieval method is not limited herein, and can be set by the skilled person according to the actual requirement, including but not limited to vlad and bof. The value of n can also be set by the skilled person, and is not limited herein, and can be set to a value between 10 and 20.
S702, extracting sample feature point data of each target image from the front cover feature point data set, and taking the extracted sample feature point data as target feature point data.
And S703, performing feature point matching on each target feature point data by using the cover feature point data.
After the target image is screened out, the characteristic point data can be compared one by one, so that the accurate matching of the cover image is realized, and the accurate identification of the picture book is realized.
S704, performing abnormal matching point elimination on the successfully matched feature points in the target feature point data, and counting the number of the successfully matched feature points in each target feature point data after the abnormal matching point elimination operation.
In consideration of different feature point extraction methods and matching methods, certain matching errors may exist, so that the embodiment of the application may further perform anomaly identification on feature points successfully matched, that is, find out and delete wrong matching points therein, so as to ensure accuracy and reliability of matching, wherein a specific anomaly matching point finding method is not limited here, and can be selected or set by a technician according to the practice, including but not limited to RANSAC.
S705, if the number of the feature points which are successfully matched with the maximum is larger than the first number threshold, the target feature point data corresponding to the number of the feature points which are successfully matched with the maximum is used as the cover feature point data which is successfully matched with the cover feature point data.
In consideration of the actual situation, even if the user is prompted to shoot the cover of the picture book, the actually obtained cover image is not necessarily the cover image of the picture book, or the obtained cover image may also be the unsupported drawn cover image, for the cover image of this kind, the picture book corresponding to the sample feature point data with the largest number of successfully matched sample feature points cannot be directly identified as the current cover picture book, so as to prevent the misrecognition and the broadcast of the picture book, therefore, the embodiment of the present application may further determine whether the number of the sample feature points with the largest successfully matched sample reaches a certain number threshold, that is, whether the similarity with the screened cover image is high enough, and determine the sample feature point data corresponding to the successfully matched sample feature point number as the successfully matched data only when the number reaches the certain number threshold, and perform the picture book identification. The specific value of the quantity threshold value can be set by a technician according to the requirement.
In the embodiment of the application, through the dual recognition mode of coarse recognition and accurate recognition, the recognition efficiency is improved, and the recognition accuracy is ensured, so that the cover image recognition is accurate and efficient.
As an eighth embodiment of the present application, as shown in fig. 8, in consideration of the practical situation, light, angle, distance, and the like of the image shot by the robot each time cannot be predicted, that is, there may be a great difference between the actually collected cover image and the standard cover image, and although effective compensation for the shot light and distance can be achieved by adjusting the brightness and size of the image, on one hand, adjustment processing of the image is required, which results in a large workload, and on the other hand, accurate recognition cannot be effectively achieved for different shooting angles.
In order to realize effective construction of the feature point data set of the cover, the method comprises the following steps:
s801, obtaining a plurality of first sample images of the picture book covers and picture book marks corresponding to the first sample images.
And S802, respectively carrying out perspective transformation on each first sample image to obtain second sample images respectively corresponding to each first sample image, wherein each first sample image corresponds to a plurality of second sample images with different perspective transformation angles.
S803, map the sample identifier corresponding to the first sample image to the sample identifier of the second sample image corresponding to the first sample image.
S804, feature point extraction is respectively carried out on each first sample image and each second sample image, and a plurality of groups of cover feature point data which correspond one to one are obtained.
And S805, performing associated storage on each group of cover feature point data and the corresponding book drawing identifier of the first sample image or the second sample image to obtain a cover feature point data set.
After the feature point data of the image is extracted, the feature points can be subjected to aggregate coding and then stored with the picture book identifier, and the embodiment of the application does not limit a specific aggregate coding method, and can be selected by a technician according to actual needs, including but not limited to a vlad-bof bag model.
In the embodiment of the present application, a first sample image of the cover (which may be an image captured in a preferred imaging environment) is captured in advance, and the sample image is subjected to perspective transformation, thereby obtaining second sample images which can be obtained by shooting the front cover under various angles, simultaneously extracting the characteristic points of the first sample image and the corresponding second sample image, because the feature points have the characteristic of being not deformed by scale rotation, the adjustment of the brightness and the size of the image is not needed to be carried out on the matching of the image, and finally all the extracted feature point data sets are combined into the cover feature point data set in the embodiment of the application, therefore, in the embodiment of the application, the robot only needs to utilize the front cover feature point data set, so that the influence of environmental factors such as the shot light, angle and distance on the front cover image matching can be solved, and the accuracy and effectiveness of front cover image recognition are further ensured.
As a ninth embodiment of the present application, on the basis of the fifth embodiment of the present application, in order to realize accurate recognition of the cover image, in S502, in the process of performing the cover recognition on the received cover image, the server first performs the cover region detection and extraction on the cover image, so as to obtain the cover image with less noise.
The robot can collect samples of the cover image of the drawn book in advance, mark the cover area in the sample image, and train the neural network model, so that the model for detecting and extracting the cover area is obtained.
As a specific implementation manner of acquiring an inside page image in the fifth embodiment of the present application, a tenth embodiment of the present application includes:
and the robot carries out page turning recognition on the picture book.
If the picture book is turned, the robot acquires an inner page image of the picture book after turning the page.
The principle of the tenth embodiment of the present application is the same as that of the second embodiment of the present application, and specific reference may be made to the description of the second embodiment of the present application and other related embodiments, which are not repeated herein.
As a specific implementation manner of page turning identification for page drawing in the fifth embodiment of the present application, the eleventh embodiment of the present application includes:
the robot carries out real-time video acquisition on the picture book and carries out image comparison of continuous frames on the acquired video.
And if the difference degree of the two continuous frames of images is greater than the first difference threshold value, the robot sets the state of the picture book to be the first state.
When the state of the picture book is the first state, the robot carries out image comparison of continuous frames on the video and detects the image quality of the latest frame of image in the video.
If the difference degree of the adjacent frame images in the continuous n frame images is smaller than the second difference threshold value and the image quality is larger than the quality threshold value, the robot updates the state of the picture book to the second state and judges that the picture book is turned over, wherein n is a positive integer larger than 2, and the first difference threshold value is larger than or equal to the second difference threshold value.
The principle of the eleventh embodiment of the present application is the same as that of the third embodiment of the present application, and specific reference may be made to the description of the third embodiment of the present application and other related embodiments, which are not repeated herein.
As a specific implementation manner of performing the inner page image recognition in the fifth embodiment of the present application, a twelfth embodiment of the present application includes:
the robot extracts the feature points of the internal page image to obtain a group of internal page feature point data corresponding to the internal page image, and performs feature point matching on each group of sample feature point data by using the internal page feature point data.
And the robot carries out abnormal matching point elimination on the successfully matched sample feature points in the sample feature point data, and counts the number of the successfully matched sample feature points in each group of sample feature point data after the abnormal matching point elimination operation.
And the robot screens out successfully matched sample feature point data according to the number of successfully matched sample feature points, and takes the page number corresponding to the successfully matched sample feature point data as the page number of the inner page image.
The principle of the eleventh embodiment of the present application is the same as that of the fourth embodiment of the present application, and specific reference may be made to the description of the fourth embodiment of the present application and other related embodiments, which are not repeated herein.
Fig. 9 shows a block diagram of a structure of the picture recognition apparatus provided in the embodiment of the present application, which corresponds to the method of the above embodiment, and only shows a part related to the embodiment of the present application for convenience of description. The sketch recognition apparatus illustrated in fig. 9 may be an execution subject of the sketch recognition method provided in the first embodiment.
Referring to fig. 9, the picture recognition apparatus includes:
and the cover transmission module 91 is used for acquiring a cover image of the drawing book and sending the cover image to the server.
The inner page obtaining module 92 is configured to obtain the inner page image of the picture book if the inner page feature data set and the audio set sent by the server are received, where the inner page feature data set and the audio set are obtained by the server receiving the front cover image, performing picture book recognition on the front cover image to obtain a picture book identifier, and obtaining the inner page feature data set and the audio set in a local storage according to the obtained picture book identifier.
And a page number identification module 93, configured to perform page number identification on the inner page image based on the inner page feature data set, to obtain a page number of the inner page image.
And the audio searching module 94 is configured to search for the audio corresponding to the page number from the audio set, and output the audio.
Further, the inside page acquiring module 92 includes:
and the page turning identification module is used for carrying out page turning identification on the picture book.
And the image acquisition module is used for acquiring the inner page image of the drawn book after page turning if the drawn book is turned.
Further, the page turning identification module comprises:
and carrying out real-time video acquisition on the picture book, and carrying out image comparison of continuous frames on the acquired video.
And if the difference degree of the two continuous frames of images is greater than a first difference threshold value, setting the state of the picture book to be a first state.
And when the state of the picture book is a first state, carrying out image comparison of continuous frames on the video and detecting the image quality of the latest frame of image in the video.
If the difference degree of the adjacent frame images in the continuous n frame images is smaller than a second difference threshold value and the image quality is larger than a quality threshold value, updating the state of the picture book to be in a second state and judging that the picture book is turned over, wherein n is a positive integer larger than 2, and the first difference threshold value is larger than or equal to the second difference threshold value.
Further, the page number recognition module 93 includes:
and extracting feature points of the inner page image to obtain a group of inner page feature point data corresponding to the inner page image, and respectively performing feature point matching on each group of sample feature point data by using the inner page feature point data.
And carrying out abnormal matching point elimination on the successfully matched sample feature points in the sample feature point data, and counting the number of the successfully matched sample feature points in each group of sample feature point data after the abnormal matching point elimination operation.
And screening the successfully matched sample feature point data according to the number of the successfully matched sample feature points, and taking the page number corresponding to the successfully matched sample feature point data as the page number of the inner page image.
The process of implementing each function by each module in the picture book identification apparatus provided in the embodiment of the present application may specifically refer to the description of the first embodiment shown in fig. 1, and is not described herein again.
Corresponding to the method of the foregoing embodiment, fig. 10 shows a system interaction diagram of the picture recognition system provided by the embodiment of the present application, and for convenience of explanation, only the part related to the embodiment of the present application is shown. Fig. 10 illustrates that the robot and the server in the sketch recognition system may be the execution subject of the sketch recognition method provided in the fifth embodiment.
This recognition system that draws includes: robot 1001 and server 1002.
The robot 1001 acquires a cover image of the exercise book and transmits the cover image to the server 1002.
The server 1002 is configured to perform book drawing identification on the received cover image, so as to obtain a book drawing identifier corresponding to the cover image.
The server 1002 is further configured to retrieve the inner page feature data set and the audio set corresponding to the picture book identifier in a local storage, and send the retrieved inner page feature data set and the retrieved audio set to the robot 1001.
The robot 1001 is further configured to obtain the inner page image of the picture book if the inner page feature data set and the audio set are received.
The robot 1001 is further configured to perform page number recognition on the inner page image based on the inner page feature data set, so as to obtain a page number of the inner page image.
The robot 1001 is further configured to search for an audio corresponding to the page number from the audio set, and output the audio.
Meanwhile, as an alternative embodiment mode of the present application, the six to eleven embodiments of the present application may be combined with the embodiment of the present application to be applied, that is, the robot and the server in the example scenario recognition system shown in fig. 10 may be the execution subjects of the scenario recognition methods provided in the six to eleven embodiments.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance. It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements in some embodiments of the application, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first table may be named a second table, and similarly, a second table may be named a first table, without departing from the scope of various described embodiments. The first table and the second table are both tables, but they are not the same table.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
Fig. 11 is a schematic structural diagram of a robot according to an embodiment of the present application. As shown in fig. 11, the robot 11 of this embodiment includes: at least one processor 110 (only one shown in fig. 11), a memory 111, said memory 111 having stored therein a computer program 112 executable on said processor 110. The processor 110, when executing the computer program 112, implements the steps in the above-described embodiments of the present identification method, such as the steps 101 to 104 shown in fig. 1. Alternatively, the processor 110, when executing the computer program 112, implements the functions of each module/unit in the above-mentioned device embodiments, for example, the functions of the modules 61 to 64 shown in fig. 6.
The robot may include, but is not limited to, a processor 110, a memory 111. Those skilled in the art will appreciate that fig. 11 is merely an example of a robot 11 and does not constitute a limitation of the robot 11 and may include more or fewer components than shown, or some components in combination, or different components, e.g., the robot may also include an input transmitting device, a network access device, a bus, etc.
The Processor 110 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 111 may in some embodiments be an internal storage unit of the robot 11, such as a hard disk or a memory of the robot 11. The memory 111 may also be an external storage device of the robot 11, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the robot 11. Further, the memory 111 may also include both an internal storage unit and an external storage device of the robot 11. The memory 111 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer programs. The memory 111 may also be used to temporarily store data that has been transmitted or is to be transmitted.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.
The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer-readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application, and are intended to be included within the scope of the present application.

Claims (11)

1. A picture book recognition method is characterized by comprising the following steps:
acquiring a cover image of the picture book and sending the cover image to a server;
if an inner page feature data set and an audio set sent by the server are received, acquiring an inner page image of the picture book, wherein the inner page feature data set and the audio set are obtained by carrying out picture book recognition on the front cover image to obtain a picture book identifier after the server receives the front cover image, and calling the inner page feature data set and the audio set in a local storage according to the obtained picture book identifier;
performing page number identification on the inner page image based on the inner page feature data set to obtain the page number of the inner page image;
and searching the audio corresponding to the page number from the audio set, and outputting the audio.
2. The method for recognizing a sketch as claimed in claim 1, wherein the obtaining of the inner page image of the sketch comprises:
performing page turning identification on the picture book;
and if the picture book is turned, acquiring an inner page image of the picture book after turning the page.
3. The method for recognizing a sketch as claimed in claim 2, wherein the step of performing page turning recognition on the sketch comprises:
carrying out real-time video acquisition on the picture book, and carrying out image comparison of continuous frames on the acquired video;
if the difference degree of two continuous frames of images is greater than a first difference threshold value, setting the state of the picture book to be a first state;
when the state of the picture book is a first state, carrying out image comparison of continuous frames on the video, and detecting the image quality of the latest frame of image in the video;
if the difference degree of the adjacent frame images in the continuous n frame images is smaller than a second difference threshold value and the image quality is larger than a quality threshold value, updating the state of the picture book to be in a second state and judging that the picture book is turned over, wherein n is a positive integer larger than 2, and the first difference threshold value is larger than or equal to the second difference threshold value.
4. The picture book identification method according to any one of claims 1 to 3, wherein the inner page feature data set includes a plurality of sets of sample feature point data obtained by respectively subjecting the sample images of the respective inner pages in the picture book to perspective transformation and performing feature point extraction on the sample images of the respective inner pages and the images obtained by the perspective transformation, and a page number of the inner page corresponding to each set of the sample feature point data, wherein each set of the sample feature point data includes a plurality of sample feature points, and each inner page corresponds to a plurality of sets of the sample feature point data,
the page number recognition of the inner page image based on the inner page feature data set comprises:
extracting feature points of the inner page image to obtain a group of inner page feature point data corresponding to the inner page image, and respectively performing feature point matching on each group of sample feature point data by using the inner page feature point data;
carrying out abnormal matching point elimination on successfully matched sample feature points in the sample feature point data, and counting the number of successfully matched sample feature points in each group of sample feature point data after the abnormal matching point elimination operation is carried out;
and screening the successfully matched sample feature point data according to the number of the successfully matched sample feature points, and taking the page number corresponding to the successfully matched sample feature point data as the page number of the inner page image.
5. A picture book recognition method is characterized by comprising the following steps:
the robot acquires a cover image of the picture book and sends the cover image to the server;
the server performs picture book recognition on the received cover image to obtain a picture book identification corresponding to the cover image;
the server calls an inner page feature data set and an audio set corresponding to the picture book identification from a local storage and sends the called inner page feature data set and the called audio set to the robot;
if the inner page feature data set and the audio set are received, the robot acquires an inner page image of the picture book;
the robot carries out page number identification on the inner page image based on the inner page feature data set to obtain the page number of the inner page image;
and the robot searches the audio corresponding to the page number from the audio set and outputs the audio.
6. The method for recognizing the sketch as claimed in claim 5, wherein the step of the server performing sketch recognition on the received front cover image to obtain a sketch identifier corresponding to the front cover image comprises:
extracting feature points of the cover image to obtain a group of cover feature point data corresponding to the cover image;
performing feature point matching on each group of cover feature point data in a cover feature point data set by using the cover feature point data, wherein the cover feature point data set comprises a plurality of groups of sample feature point data, each group of sample feature point data corresponds to a book drawing identifier of a cover, and the plurality of groups of sample feature point data are obtained by respectively performing perspective transformation on first sample images of a plurality of book drawing covers and performing feature point extraction on the first sample images of the covers and second sample images obtained through the perspective transformation by the server;
and if the successfully matched cover feature point data exists, taking the picture book identifier corresponding to the successfully matched cover feature point data as the picture book identifier of the cover image.
7. The picture recognition method of claim 6, wherein the using the cover feature point data to perform feature point matching for each set of cover feature point data in a cover feature point data set comprises:
acquiring the first sample image and the second sample image of a plurality of drawing book covers, performing image retrieval on the first sample image and the second sample image by using the cover images, and taking the retrieved first sample image and the retrieved second sample image as target images;
extracting sample feature point data of each target image from the cover feature point data set, and taking the extracted sample feature point data as target feature point data;
performing feature point matching on each target feature point data by using the cover feature point data;
carrying out abnormal matching point elimination on successfully matched feature points in the target feature point data, and counting the number of feature points successfully matched with each target feature point data after the abnormal matching point elimination operation;
and if the number of the feature points successfully matched with the maximum is larger than a first number threshold, taking the target feature point data corresponding to the number of the feature points successfully matched with the maximum as cover feature point data successfully matched with the cover feature point data.
8. The picture recognition method of claim 6 or 7, wherein the process of constructing the cover feature point data set by the server comprises:
acquiring the first sample images of a plurality of drawing book covers and drawing book identifications corresponding to the first sample images;
respectively carrying out perspective transformation on each first sample image to obtain second sample images respectively corresponding to each first sample image, wherein each first sample image corresponds to a plurality of second sample images with different perspective transformation angles;
mapping the picture mark corresponding to the first sample image to the picture mark of the second sample image corresponding to the first sample image;
respectively extracting feature points of each first sample image and each second sample image to obtain a plurality of groups of cover feature point data which correspond to each other one by one;
and performing associated storage on each group of the cover feature point data and the corresponding book drawing identifier of the first sample image or the second sample image to obtain the cover feature point data set.
9. A picture recognition apparatus, comprising:
the cover transmission module is used for acquiring a cover image of the picture book and sending the cover image to the server;
the inner page obtaining module is used for obtaining an inner page image of the picture book if an inner page feature data set and an audio set sent by the server are received, wherein the inner page feature data set and the audio set are obtained by the server performing picture book recognition on the cover image after receiving the cover image to obtain a picture book mark and calling the inner page feature data set and the audio set in a local storage according to the obtained picture book mark;
the page number identification module is used for carrying out page number identification on the inner page image based on the inner page feature data set to obtain the page number of the inner page image;
and the audio searching module is used for searching the audio corresponding to the page number from the audio set and outputting the audio.
10. A robot, characterized in that the robot comprises a memory, a processor, a computer program being stored on the memory and being executable on the processor, the processor realizing the steps of the method according to any of the claims 1 to 4 when executing the computer program.
11. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.
CN201911026013.5A 2019-10-25 2019-10-25 Drawing recognition method and device and robot Active CN111222397B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911026013.5A CN111222397B (en) 2019-10-25 2019-10-25 Drawing recognition method and device and robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911026013.5A CN111222397B (en) 2019-10-25 2019-10-25 Drawing recognition method and device and robot

Publications (2)

Publication Number Publication Date
CN111222397A true CN111222397A (en) 2020-06-02
CN111222397B CN111222397B (en) 2023-10-13

Family

ID=70827558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911026013.5A Active CN111222397B (en) 2019-10-25 2019-10-25 Drawing recognition method and device and robot

Country Status (1)

Country Link
CN (1) CN111222397B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111998259A (en) * 2020-09-08 2020-11-27 安徽声讯信息技术有限公司 Intelligent learning auxiliary system based on desk lamp
CN112201116A (en) * 2020-09-29 2021-01-08 深圳市优必选科技股份有限公司 Logic board identification method and device and terminal equipment
CN112200230A (en) * 2020-09-29 2021-01-08 深圳市优必选科技股份有限公司 Training board identification method and device and robot
CN112201117A (en) * 2020-09-29 2021-01-08 深圳市优必选科技股份有限公司 Logic board identification method and device and terminal equipment
CN112487929A (en) * 2020-11-25 2021-03-12 深圳市云希谷科技有限公司 Image recognition method, device and equipment of children picture book and storage medium
CN117668273A (en) * 2024-02-01 2024-03-08 山东省国土测绘院 Mapping result management method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447499A (en) * 2015-10-23 2016-03-30 北京爱乐宝机器人科技有限公司 Book interaction method, apparatus, and equipment
CN107977391A (en) * 2017-03-09 2018-05-01 北京物灵智能科技有限公司 Paint this recognition methods, device, system and electronic equipment
CN110263187A (en) * 2019-06-19 2019-09-20 深圳市沃特沃德股份有限公司 Draw this recognition methods, device, storage medium and computer equipment
WO2019201008A1 (en) * 2018-04-20 2019-10-24 华为技术有限公司 Live video review method and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447499A (en) * 2015-10-23 2016-03-30 北京爱乐宝机器人科技有限公司 Book interaction method, apparatus, and equipment
CN107977391A (en) * 2017-03-09 2018-05-01 北京物灵智能科技有限公司 Paint this recognition methods, device, system and electronic equipment
WO2019201008A1 (en) * 2018-04-20 2019-10-24 华为技术有限公司 Live video review method and apparatus
CN110263187A (en) * 2019-06-19 2019-09-20 深圳市沃特沃德股份有限公司 Draw this recognition methods, device, storage medium and computer equipment

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111998259A (en) * 2020-09-08 2020-11-27 安徽声讯信息技术有限公司 Intelligent learning auxiliary system based on desk lamp
CN112201116A (en) * 2020-09-29 2021-01-08 深圳市优必选科技股份有限公司 Logic board identification method and device and terminal equipment
CN112200230A (en) * 2020-09-29 2021-01-08 深圳市优必选科技股份有限公司 Training board identification method and device and robot
CN112201117A (en) * 2020-09-29 2021-01-08 深圳市优必选科技股份有限公司 Logic board identification method and device and terminal equipment
CN112200230B (en) * 2020-09-29 2023-10-13 深圳市优必选科技股份有限公司 Training board identification method and device and robot
CN112487929A (en) * 2020-11-25 2021-03-12 深圳市云希谷科技有限公司 Image recognition method, device and equipment of children picture book and storage medium
CN117668273A (en) * 2024-02-01 2024-03-08 山东省国土测绘院 Mapping result management method
CN117668273B (en) * 2024-02-01 2024-04-19 山东省国土测绘院 Mapping result management method

Also Published As

Publication number Publication date
CN111222397B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN111222397B (en) Drawing recognition method and device and robot
CN110135411B (en) Business card recognition method and device
US11302315B2 (en) Digital video fingerprinting using motion segmentation
CN110705405B (en) Target labeling method and device
CN107885430B (en) Audio playing method and device, storage medium and electronic equipment
KR102087882B1 (en) Device and method for media stream recognition based on visual image matching
CN109727275B (en) Object detection method, device, system and computer readable storage medium
CN110858394A (en) Image quality evaluation method and device, electronic equipment and computer readable storage medium
CN111694978B (en) Image similarity detection method and device, storage medium and electronic equipment
CN110460838B (en) Lens switching detection method and device and computer equipment
CN105451029A (en) Video image processing method and device
CN111836118B (en) Video processing method, device, server and storage medium
KR20200069911A (en) Method and apparatus for identifying object and object location equality between images
CN103763480A (en) Method and equipment for obtaining video dubbing
US9113002B2 (en) Method and system for automatically capturing an object using a mobile terminal
CN110084187B (en) Position identification method, device, equipment and storage medium based on computer vision
CN107071553B (en) Method, device and computer readable storage medium for modifying video and voice
CN115393755A (en) Visual target tracking method, device, equipment and storage medium
CN115393616A (en) Target tracking method, device, equipment and storage medium
CN115004245A (en) Target detection method, target detection device, electronic equipment and computer storage medium
KR102178172B1 (en) Terminal and service providing device, control method thereof, computer readable medium having computer program recorded therefor and image searching system
CN114429628A (en) Image processing method and device, readable storage medium and electronic equipment
CN109040774B (en) Program information extraction method, terminal equipment, server and storage medium
CN114827702A (en) Video pushing method, video playing method, device, equipment and medium
CN112214639A (en) Video screening method, video screening device and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant