CN113709364B

CN113709364B - Camera identifying equipment and object identifying method

Info

Publication number: CN113709364B
Application number: CN202110959641.XA
Authority: CN
Inventors: 黄一清
Original assignee: Unisound Shanghai Intelligent Technology Co Ltd
Current assignee: Unisound Shanghai Intelligent Technology Co Ltd
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2024-02-09
Anticipated expiration: 2041-08-20
Also published as: CN113709364A

Abstract

The invention discloses a camera identification device and a method thereof, wherein the device comprises: the touch screen camera comprises a camera body, a microphone, a voice player and a controller, wherein the controller, the microphone and the voice player are respectively arranged on the camera body, and the camera body, the microphone and the voice player are respectively connected with the controller; and the server comprises a control module, a storage module, an image recognition module, a matching module, an intention analysis module and a voice synthesis module which are connected with the controller through wireless signals, wherein the image recognition module, the matching module and the storage module are respectively connected with the control module. The invention solves the problem that the solution can not be timely and accurately obtained in the process of cognizing the new things by the children.

Description

Camera identifying equipment and object identifying method

Technical Field

The invention relates to the technical field of intelligent electrical appliances, in particular to a camera identification device and a camera identification method.

Background

In the process of learning new things by children, parents cannot accompany the parents at any time or the parents cannot understand the new things, so that the solutions cannot be timely and accurately given. Over time, the problems that the children fail to answer in time are more and more, even forget, and the cognitive things of the children cannot be effectively improved.

Disclosure of Invention

In order to overcome the defects existing in the prior art, the invention provides a camera recognition device and a camera recognition method, so as to solve the problem that a child cannot accurately obtain a solution in time in the process of recognizing a new object.

To achieve the above object, there is provided a camera recognition apparatus including:

the touch screen camera comprises a camera body, a microphone, a voice player and a controller, wherein the camera body is used for collecting images of new things and playing images and videos, the controller, the microphone and the voice player are respectively arranged on the camera body, and the camera body, the microphone and the voice player are respectively connected with the controller; and

the server comprises a control module, a storage module, an image recognition module, a matching module, an intention analysis module and a voice synthesis module, wherein the control module is connected with the controller through wireless signals, the storage module is stored with an image library and a plurality of information libraries, the image recognition module is used for recognizing an intention object marked in an image of the new object and generating an intention object image, the matching module is used for matching the intention object image to an image of one object in the image library to determine the information library of the intention object, the image recognition module, the matching module, the intention analysis module and the storage module are respectively connected with the control module, the image library comprises a plurality of images of the object, each information library comprises introduction information of one object, and each image of the object is correspondingly connected with one information library.

Further, the server further comprises a video synthesis module, and the video synthesis module is connected with the control module.

Further, the server is a cloud server.

Further, the intent analysis module comprises a voice recognition unit and a semantic understanding unit, wherein the voice recognition unit is connected with the control module, and the semantic understanding unit is connected with the voice recognition unit and the control module.

The invention provides an object identifying method of an object identifying camera device, which comprises the following steps:

the camera body collects the new object image;

the microphone collects inquiry voices;

marking attention to an object in the new object image through a touch screen of the camera body;

the controller simultaneously and externally transmits the new object image marked with the intention object and the query voice;

constructing an image library and a plurality of information libraries in a storage module, wherein the image library comprises images of a plurality of things, each information library comprises introduction information of one thing, and the images of each thing are correspondingly connected with one information library;

the control module receives the new object image and the query voice at the same time;

the image recognition module recognizes the intention object marked in the new object image and generates an intention object image;

a matching module matches the image of the intended object to an image of the object in the image library to determine an information library of the intended object;

the intention analysis module analyzes the query voice to obtain the semantics of the query voice;

extracting answer information related to the semantics from the determined information base of the intention object by a control module;

the voice synthesis module synthesizes the answer information into answer voice;

the control module sends the answer voice outwards;

the controller receives the answer speech;

the voice player outputs the answer voice.

Further, the method further comprises the following steps:

after receiving the new object image and the query voice at the same time, classifying and storing the new object image in the storage module;

after receiving the novelty image and the query speech simultaneously a plurality of times, the microphone collects review speech;

the controller sends the review voice to the outside;

the control module receives the review voice;

the intention analysis module analyzes the review voice to obtain the semantics of the review voice;

in the storage module, the control module extracts the classified and saved new object images matched with the semantics of the review voice based on the semantics of the review voice and sends the new object images to the outside;

the controller receives the new object image sent by the control module;

the camera body plays the new object image sent by the control module.

Further, the method further comprises the following steps:

after receiving the novelty image and the query speech simultaneously a plurality of times, the microphone collects recall speech;

the controller sends the recall voice to the outside;

the control module receives the recall voice;

the intention analysis module analyzes the recall voice to obtain the semantics of the recall voice;

in the storage module, the control module extracts the classified stored new object images matched with the semantics of the recall voice;

the video synthesis module synthesizes the extracted new object images into video information;

the control module sends the video information outwards;

the controller receives the video information;

the camera body plays the video information.

Further, the step of the intention parsing module parsing the query speech to obtain semantics of the query speech includes:

a voice recognition unit recognizes the query voice to obtain a text of the query voice;

the semantic understanding unit is used for understanding the text of the query voice to obtain the semantic of the query voice.

The camera recognition equipment has the beneficial effects that the camera recognition equipment endows the common touch screen camera with the function of 'circle image recognition' based on the artificial intelligent capabilities of image recognition, voice recognition, semantic understanding, voice synthesis and the like, and helps children and other people to recognize and learn new things more quickly and efficiently.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:

fig. 1 is a schematic block diagram of an identification camera device according to an embodiment of the present invention.

Detailed Description

The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the invention are shown in the drawings.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Referring to fig. 1, the present invention provides a camera device comprising: a touch screen camera 1 and a server 2.

The touch screen camera 1 is used for carrying about for a user. In this embodiment, the user includes, but is not limited to, children. The user sends instructions to the server through the touch screen camera, and the server provides powerful storage and calculation processing services for the user based on the instructions.

Specifically, the touch screen camera 1 includes a camera body 11, a microphone 12, a voice player 13, and a controller 14.

The camera body 11 has a touch screen, and a user inputs various instructions to the camera body through the touch screen to control the camera body to take a picture and process the picture. The camera body is used for collecting the new object image and playing the image and the video. The camera body can take photos and videos and store the photos and videos in a memory card in the camera body, and the images and videos stored in the memory card are checked through a review instruction.

The microphone 12, the voice player 13, and the controller 14 are integrally mounted inside the camera body 11. The camera body 11, the microphone 12, and the voice player 13 are connected to the controller 14, respectively.

In this embodiment, a user inputs an instruction to the controller through the touch screen, looks back at an image stored in the camera body, and marks the image through image processing software. Specifically, the user can use the screen brush to circle out the object to be queried (i.e. the object to be understood by the user) on the photographed image, and the controller sends the server to perform corresponding processing.

Specifically, the server 2 includes a control module 22, a storage module 21, an image recognition module 23, and a matching module 24, an intention analysis module 25, and a speech synthesis module 26.

Wherein the storage module 21 is pre-constructed and stored with an image library and a plurality of information libraries. The image library includes images of a plurality of things. Each information base includes the associated introduction information (text information) of a thing. The image of each object is correspondingly connected with an information base, namely the image and the information base corresponding to the object in the image are mutually bound to form a mapping relation. After determining a certain image in the image library, the object name corresponding to the image and the related introduction information of the object corresponding to the image can be obtained. For example, an image in the image library is "chrysanthemum", and the corresponding information library is the information library of "chrysanthemum", and the information in the information library includes, but is not limited to: introduction of chrysanthemum, origin of chrysanthemum, ancient poetry related to chrysanthemum, quotation related to chrysanthemum and the like.

Meanwhile, the storage module is used for storing the image data and storing the image data based on the corresponding classification type. Specifically, the classification type is not limited herein, such as classification by animals, plants, microorganisms, etc.

The control module 22 is wirelessly connected to the controller 14. Specifically, the controller is connected with a wireless communication module, the control module is connected with another wireless communication module, the wireless communication module of the controller is connected with the wireless communication module of the control module through wireless signals, and wireless signal transmission is carried out to transmit information such as images, voice and video.

The image recognition module 23 is used for recognizing the intention object marked in the new object image and generating an intention object image.

The matching module 24 is used to match the image of the intended object to an image of a thing in the image library to determine an information library of the intended object.

The intention parsing module 25 parses the semantics of various voice instructions externally transmitted by the controller through voice recognition. After the intention analysis module performs voice recognition and analysis on the semantics of the voice instructions, the control module performs corresponding processing actions based on the semantics of various voice instructions. Specifically, the intent resolution module 25 includes a voice recognition unit connected to the control module and a semantic understanding unit connected to the voice recognition unit. The semantic understanding unit is connected with the voice recognition unit and the control module.

The voice synthesis module 26 is used for synthesizing the information output by the control module into answer voice. The speech synthesis module 26 converts the text information (information output by the control module) into audio (answer speech). After synthesizing the answer speech, the control module sends the answer speech to the controller, and the controller plays the answer speech through the speech player to inform the user.

For clearly explaining the working principle of the camera identifying device, the invention is explained by taking the intended object as chrysanthemum as an example:

the user uses the camera body to shoot a photo containing the image of the intended object (watermelon), circles the intended object (watermelon) in the photo on the touch screen of the camera body, and clicks through the touch screen to record the query voice. After the microphone collects the query voice of the user, the uploading instruction on the touch screen is clicked, and the controller simultaneously sends the query voice and the picture of the circled intended object to the control module.

The control module sends the query voice to the intention analysis module and sends the photo to the image recognition module after receiving the query voice and the photo at the same time. The intention analysis module carries out voice recognition on the query voice and analyzes the semantic meaning of the query voice and sends the query voice to the controller. On the other hand, the image recognition module performs image recognition processing on the photo, generates a picture of the intention image and sends the picture to the control module. After the control module obtains the semantic meaning of the query voice and the picture of the intention image, the intention picture is matched with the image of the watermelon in the value image library, and the watermelon information library is determined. Based on the semantic meaning of the query voice, the control module extracts watermelon information matched with the semantic meaning of the query voice and sends the watermelon information to the voice synthesis module. For example, if the semantic meaning of the query voice is "watermelon variety", the control module extracts introduction information (text information) of the variety of watermelon in the watermelon information base. The voice synthesis module synthesizes the introduction information of the watermelon variety sent by the control module into answer voice and sends the answer voice to the control module. The control module receives the answer voice and sends the answer voice to the outside.

The controller receives the answer voice of the control module and sends the answer voice to the voice player, and the voice player plays the answer voice for the user to refer to learning.

The camera recognition equipment provided by the invention has the advantages that the camera recognition equipment endows the common touch screen camera with the function of 'circle image recognition' based on the artificial intelligent capabilities of image recognition, voice recognition, semantic understanding, voice synthesis and the like, and helps children and other people to recognize and learn new things more quickly and efficiently.

As a preferred embodiment, the server further comprises a video composition module 27. The video composition module 27 is connected to the control module 22. After the controller sends the new object image to the control module, the control module stores the new object image in the storage module and sorts and stores the new object image based on a sort type after the image recognition module recognizes the new object image. After the user clicks to watch the video (i.e. what i know about what is new in the spring of 2021) through the touch screen, the controller generates a video composition command and sends it to the outside. After the control module receives the video synthesis instruction, the control module extracts a plurality of new object images in the storage module according to time or types. The video synthesis module synthesizes the extracted plurality of new object images into video information. The control module resends the video information to the controller, and the controller plays the video information through the camera body.

After a user clicks and views the picture (such as viewing flowers which I have seen) through the touch screen, the controller generates a picture collection instruction and sends the picture collection instruction to the outside. After the control module receives the picture collection instruction, the control module extracts a plurality of new object images from the storage module. The control module sends the extracted multiple new object images to the controller, and the controller plays the extracted multiple new object images through the camera body.

In addition, the user can share, forward and the like the photographed new object image through the server.

In this embodiment, the server is a cloud server.

With continued reference to fig. 1, the present invention provides a method for identifying a camera device, including the following steps:

s1: the camera body 11 captures a new image.

S2: the microphone 12 collects the query speech.

S3: the object is noted in the new object image by the touch screen of the camera body 11.

S4: the controller 14 simultaneously transmits the new image of the object with the query voice.

S5: an image library and a plurality of information libraries are constructed in the storage module 21, wherein the image library comprises images of a plurality of things, each information library comprises introduction information of one thing, and the images of each thing are correspondingly connected with one information library.

S6: the control module 22 receives both the novelty image and the query speech.

S7: the image recognition module 23 recognizes the intended object noted in the new object image and generates an intended object image.

S8: the matching module 24 matches the image of the intended object to an image of a thing in the image library to determine an information library of the intended object.

S9: the intention parsing module 25 recognizes and parses the query voice through voice to obtain the semantics of the query voice;

s10: in the information base of the determined intent object, the control module 22 extracts answer information related to the semantics.

S11: the speech synthesis module 26 synthesizes the answer information into answer speech.

S12: the control module 22 sends the answer speech to the outside.

S13: the controller 14 receives the answer speech.

S14: the voice player 13 outputs answer voices.

Wherein, step S9 includes:

As a preferred embodiment, the object identifying method of the object identifying camera device of the present invention further includes the following steps:

a. after the control module receives the new object image and the query speech at the same time, the storage module 21 sorts and saves the new object image.

b. After receiving the news image and the query voice simultaneously a plurality of times, the user transmits the review voice, and the microphone 12 collects the review voice.

c. The controller 14 sends back a look-back voice to the outside.

d. The control module 22 receives the back-looking voice;

e. the intention parsing module 25 recognizes and parses the review speech through speech to obtain semantics of the review speech;

f. in the storage module 21, the control module 22 extracts classified and stored new object images matched with the semantics of the review voice based on the semantics of the review voice and sends the new object images to the outside;

g. the controller 14 receives the novelty image sent by the control module 22;

h. the camera body 11 plays the new object image transmitted from the control module 22.

Wherein, step e comprises:

a voice recognition unit recognizes the review voice to obtain a text of the review voice;

the semantic understanding unit is used for understanding the text of the review voice to obtain the semantic of the review voice.

A. after receiving the news image and the query voice simultaneously a plurality of times, the user utters a recall voice, and the microphone 12 captures the recall voice.

B. The controller 14 sends recall speech externally.

C. The control module 22 receives recall speech.

D. The intention parsing module 25 recognizes and parses the recall voice through voice to obtain the semantics of the recall voice.

E. In the storage module 21, the control module 22 extracts the classified stored new object image matching the semantics of the recall speech.

F. The video composition module 27 composes the extracted novelty image into video information.

G. The control module 22 sends the video information out.

H. The controller 14 receives video information.

I. The camera body 11 plays video information.

Wherein, step D includes:

a voice recognition unit recognizes the recall voice to obtain a text of the recall voice;

the semantic understanding unit is used for understanding the text of the recall voice to obtain the semantic of the recall voice.

In the object recognition method of the object recognition camera equipment, the image recognition: identifying the circled area; default is full screen selection; speech recognition, semantic understanding, speech synthesis: for voice interaction; image library: based on the intelligent image recognition large model, the picture content can be recognized and converted into the corresponding object name; the information base contains a large number of intelligent question-answer pairs, and can intelligently match questions and give corresponding answers.

The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Claims

1. A camera recognition apparatus, comprising:

the server comprises a control module, a storage module, an image recognition module, a matching module, an intention analysis module and a voice synthesis module, wherein the control module is connected with the controller through wireless signals to transmit images, voices and video information, the storage module is used for storing an image library and a plurality of information libraries and storing image data based on corresponding classification types, the image recognition module is used for recognizing intention objects marked in images of the new objects and generating an intention object image, the matching module is used for matching the intention object image to an image of one object in the image library to determine the information library of the intention object, the intention analysis module and the voice synthesis module are respectively connected with the control module, the image library comprises a plurality of images of the objects, each information library comprises introduction information of one object, and the image of each object is correspondingly connected with one information library; a user inputs an instruction to the controller through a touch screen of the touch screen camera, looks back at an image stored in the camera body, marks the image through image processing software, and looks back at the image and video stored in a memory card of the touch screen camera through the looking back instruction;

the intent analysis module comprises a voice recognition unit and a semantic understanding unit, the voice recognition unit is connected with the control module, the semantic understanding unit is connected with the voice recognition unit and the control module, after the intent analysis module performs voice recognition and analyzes the semantics of voice instructions, the control module performs corresponding processing actions based on the semantics of various voice instructions, the intent analysis module comprises a voice recognition unit connected with the control module and a semantic understanding unit connected with the voice recognition unit, and the semantic understanding unit is connected with the voice recognition unit and the control module;

the voice synthesis module is used for synthesizing the information output by the control module into answer voices, the voice synthesis module converts the text information into audio, after the answer voices are synthesized, the control module sends the answer voices to the controller, and the controller plays the answer voices through the voice player so as to inform a user.

2. The camera device of claim 1, wherein the server further comprises a video composition module, the video composition module being coupled to the control module.

3. The camera identification device of claim 1, wherein the server is a cloud server.

4. A method of identifying objects in a camera device according to any one of claims 1 to 3, comprising the steps of: