CN108334498A - Method and apparatus for handling voice request - Google Patents
Method and apparatus for handling voice request Download PDFInfo
- Publication number
- CN108334498A CN108334498A CN201810124300.9A CN201810124300A CN108334498A CN 108334498 A CN108334498 A CN 108334498A CN 201810124300 A CN201810124300 A CN 201810124300A CN 108334498 A CN108334498 A CN 108334498A
- Authority
- CN
- China
- Prior art keywords
- image
- specified object
- voice request
- voice
- response
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S11/00—Systems for determining distance or velocity not using reflection or reradiation
- G01S11/14—Systems for determining distance or velocity not using reflection or reradiation using ultrasonic, sonic, or infrasonic waves
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3343—Query execution using phonetics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Abstract
The embodiment of the present application discloses the method and apparatus for handling voice request.One specific implementation mode of this method includes:In response to receiving voice request, which is parsed;Include the intention for obtaining the relevant information for specifying object in response to the analysis result to the voice request, obtains the image for including the specified object;The relevant information of the specified object is determined based on the image;The relevant information of analysis result and the specified object based on the voice request, generates voice-response information;Export the voice-response information.The embodiment of the present application includes the image for specifying object by obtaining, and can determine the relevant information of specified object, and then more accurately export voice-response information.
Description
Technical field
The invention relates to field of computer technology, the method and apparatus for more particularly, to handling voice request.
Background technology
The intelligent electronic devices such as intelligent sound box are interacted typically only by voice and user, and then meet the need of user
It asks.For example, user asks " what color sky is ", intelligent electronic device can answer " blue ".If user is to object
What is described is not specific enough, and intelligent electronic device can not obtain enough information to judge the demand of user, can not usually provide standard
True response message.
Invention content
The embodiment of the present application proposes the data capture method and device for server.
In a first aspect, the embodiment of the present application provides a kind of method for handling voice request, including:In response to receiving
To voice request, voice request is parsed;Include obtaining the phase for specifying object in response to the analysis result to voice request
The intention of information is closed, the image for including specified object is obtained;The relevant information for specifying object is determined based on image;It is asked based on voice
The relevant information of the analysis result and specified object asked generates voice-response information;Export voice-response information.
In some embodiments, include obtaining that the related of object is specified to believe in response to the intents result to voice request
The intention of breath obtains the image for including specified object, including:Include obtaining to refer in response to the intents result to voice request
Determine the intention of the relevant information of object, sends the instruction of image of the acquisition comprising specified object;Obtain acquisition includes specified pair
The image of elephant.
In some embodiments, the image for including specified object of acquisition is obtained, including:Determine the use for sending out voice request
The location information at family;Obtain the image for including specified object that the location information based on user is acquired.
In some embodiments, the location information for the user for sending out voice request is determined, including:It is determined by auditory localization
Send out the location information of the user of voice request.
In some embodiments, intents are carried out to voice request, including:Voice request is converted into text sentence;
Determine whether text sentence includes preset keyword;In response to determining that text sentence includes predetermined keyword, determines and be intended to solution
Analysis result includes obtaining the intention for the relevant information for specifying object.
In some embodiments, the relevant information for specifying object is determined based on image, including:In response to determining voice request
Analysis result include the type keyword of indicated specified object, the image data of the type indicated by type keyword
The matched image of image institute of specified object is determined whether there is in library;If it is determined that the figure of the type indicated by type keyword
As there is the matched image of image institute of specified object in database, by information corresponding to matched image be determined as specifying
The relevant information of object;If it is determined that there is no the figures of specified object in the image data base of type indicated by type keyword
As the matched image of institute, the matched image of image institute for specifying object is searched in total image data base, by the matched image of institute
Corresponding information is determined as the relevant information of specified object;In response to determining that it is indicated that the analysis result of voice request does not include
Specified object type keyword, the matched image of image institute for specifying object is searched in total image data base, by institute
The information corresponding to image matched is determined as the relevant information of specified object.
Second aspect, the embodiment of the present application provide a kind of device for handling voice request, including:Resolution unit,
It is configured to, in response to receiving voice request, parse voice request;Acquiring unit is configured in response to voice
The analysis result of request includes the intention for obtaining the relevant information for specifying object, obtains the image for including specified object;It determines single
Member is configured to determine the relevant information for specifying object based on image;Generation unit is configured to the parsing based on voice request
As a result with the relevant information of specified object, voice-response information is generated;Output unit is configured to output voice-response information.
In some embodiments, acquiring unit is further configured to obtain the figure for including specified object as follows
Picture:Include obtaining the intention for the relevant information for specifying object in response to the intents result to voice request, sends acquisition packet
The instruction of image containing specified object;Obtain the image for including specified object of acquisition.
In some embodiments, acquiring unit be further configured to obtain as follows acquisition comprising specified pair
The image of elephant:Determine the location information for the user for sending out voice request;Obtain the location information based on user acquired include
The image of specified object.
In some embodiments, acquiring unit is further configured to determine the use for sending out voice request as follows
The location information at family:The location information for the user for sending out voice request is determined by auditory localization.
In some embodiments, resolution unit is further configured to carry out intention solution to voice request as follows
Analysis:Voice request is converted into text sentence;Determine whether text sentence includes preset keyword;In response to determining text language
Sentence includes predetermined keyword, determines that intents result includes obtaining the intention for the relevant information for specifying object.
In some embodiments, determination unit is further configured to determine the related letter of specified object as follows
Breath:The type keyword for including indicated specified object in response to determining the analysis result of voice request, in type keyword
The matched image of image institute of specified object is determined whether there is in the image data base of indicated type;If it is determined that in type
The matched image of image institute that there is specified object in the image data base of type indicated by keyword, by the matched image of institute
Corresponding information is determined as the relevant information of specified object;If it is determined that the image data of the type indicated by type keyword
There is no the matched image of the image of specified object institute in library, is searched in total image data base and the image of object is specified to be matched
Image, by information corresponding to matched image be determined as the relevant information of specified object;In response to determining voice request
Analysis result do not include indicated specified object type keyword, in total image data base search specify object figure
As the matched image of institute, the information corresponding to the matched image of institute is determined as to specify the relevant information of object.
The third aspect, the embodiment of the present application provide a kind of electronic equipment, including:One or more processors;Storage dress
It sets, for storing one or more programs, when one or more programs are executed by one or more processors so that one or more
A processor realizes the method such as any embodiment in the method for handling voice request.
Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, are stored thereon with computer journey
Sequence realizes the method such as any embodiment in the method for handling voice request when the program is executed by processor.
Method and apparatus provided by the embodiments of the present application for handling voice request, by being asked in response to receiving voice
It asks, voice request is parsed.Then, include obtaining that the related of object is specified to believe in response to the analysis result to voice request
The intention of breath obtains the image for including specified object.Later, the relevant information for specifying object is determined based on image.Then, it is based on
The relevant information of the analysis result of voice request and specified object generates voice-response information.Finally, output voice response letter
Breath.Method provided by the embodiments of the present application includes the image for specifying object by obtaining, and can determine the related letter of specified object
Breath, and then more accurately judge user view, the voice-response information for more meeting user demand is provided.
Description of the drawings
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other
Feature, objects and advantages will become more apparent upon:
Fig. 1 is that this application can be applied to exemplary system architecture figures therein;
Fig. 2 is the flow chart according to one embodiment of the method for handling voice request of the application;
Fig. 3 is the schematic diagram according to an application scenarios of the method for handling voice request of the application;
Fig. 4 is the flow chart according to another embodiment of the method for handling voice request of the application;
Fig. 5 is the flow chart according to another embodiment of the method for handling voice request of the application;
Fig. 6 is the flow chart according to another embodiment of the method for handling voice request of the application;
Fig. 7 is the structural schematic diagram according to one embodiment of the device for handling voice request of the application;
Fig. 8 is adapted for the structural schematic diagram of the computer system of the electronic equipment for realizing the embodiment of the present application.
Specific implementation mode
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, is illustrated only in attached drawing and invent relevant part with related.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 shows the method for handling voice request that can apply the application or the dress for handling voice request
The exemplary system architecture 100 for the embodiment set.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 provide communication link medium.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted by network 104 with server 105 with using terminal equipment 101,102,103, to receive or send out
Send message etc..Various telecommunication customer end applications can be installed on terminal device 101,102,103, such as speech processing applications,
Image processing application, the application of shopping class, searching class application, instant messaging tools, mailbox client, social platform software etc..
Terminal device 101,102,103 can be the various electronic equipments for having filming apparatus and supporting interactive voice,
Including but not limited to smart mobile phone, tablet computer, E-book reader, pocket computer on knee and desktop computer etc..
Server 105 can be to provide the server of various services, such as to specified on terminal device 101,102,103
The relevant information of object provides the background server supported.Background server can carry out the data such as the voice request that receives
The processing such as analysis, and handling result (such as voice-response information) is fed back into terminal device.
It should be noted that the method for handling voice request that the embodiment of the present application is provided can be by server
105 or terminal device 101,102,103 execute, correspondingly, the device for handling voice request can be set to server 105
Or in terminal device 101,102,103.
It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the stream of one embodiment of the method for handling voice request according to the application is shown
Journey 200.The method for being used to handle voice request, includes the following steps:
Step 201, in response to receiving voice request, voice request is parsed.
In the present embodiment, it is (such as shown in FIG. 1 to run electronic equipment thereon for the method for handling voice request
Server) it can be responded after receiving voice request:Voice request is parsed.Voice request is with voice
Form send request.
In practice, above-mentioned electronic equipment can be terminal device, then above-mentioned electronic equipment can receive user to say
The voice request that the forms such as words are sent out.Above-mentioned electronic equipment can also be server, then above-mentioned electronic equipment can receive end
End equipment voice request transmitted after the voice request for getting user.
In specific application scenarios, user can a flower at one's side with finger and be asked " what flower this is ".Then voice
Request may include the audio signal that content is " what flower this is ".
The technologies such as speech recognition may be used to parse voice request, it is specific interior indicated by voice request to obtain
Hold, particular content here is the information to be expressed of user.Voice request can be switched to word by speech recognition technology, and to text
Word is identified.
In some optional realization methods of the present embodiment, text sentence is known using natural language recognition technology
Not, specifically can voice request be first converted into text sentence, then text sentence is carried out using natural language recognition technology
Identification.
In the present embodiment, the analysis result of voice request can also include to sending out other than the text identified
The intents result of the user of voice request.The intention of user can be parsed according to text identification result.Such as it can be with base
The intention of user is parsed in the intents model trained using machine learning method.Herein, it is intended that analytic modell analytical model
Can train to obtain based on the sample text sentence set that user view is marked, text sentence can be characterized and anticipated with user
Correspondence between figure.
Step 202, include the intention for obtaining the relevant information for specifying object in response to the analysis result to voice request, obtain
Take the image for including specified object.
In the present embodiment, include the feelings for the intention for obtaining the relevant information for specifying object in the analysis result of voice request
Under condition, above-mentioned electronic equipment obtains the image for including specified object.Specified object can be appointed arbitrary life entity or
Article.The relevant information of specified object is and the relevant various information of specified object, for example, specified object title, classification, phase
Close knowledge, related commentary etc..Analysis result includes the intention for obtaining the relevant information for specifying object, shows that user sends out voice and asks
In order to allow the terminal device interacted that can obtain the relevant information of specified object, such terminal device can be according to institute for Seeking Truth
The relevant information of acquisition makes voice response.Include the intention of user in analysis result, it is intended that may include obtaining specified pair
The relevant information of elephant can also include other intentions.
In order to determine that the relevant information of specified object, above-mentioned electronic equipment obtain the image for including specified object.Herein,
If above-mentioned electronic equipment is server, above-mentioned electronic equipment can obtain the image for specifying object from terminal device.If
Above-mentioned electronic equipment is terminal device, which can be the image of terminal device shooting, can also be that terminal device receives use
It is that family uploads or receive image captured by other electronic equipments.
Step 203, the relevant information for specifying object is determined based on image.
In the present embodiment, above-mentioned electronic equipment determines the relevant information of specified object based on acquired image.The figure
Seem the image of specified object, includes specified object in the picture, so, specified object can be determined based on the image
Relevant information.
Various ways may be used and determine the relevant information for specifying object based on image.It can be by the image and existing figure
As the image in database is compared, the image for being higher than threshold value with the similarity of the image is searched.It later can be by searching for
The image arrived, which determines, specifies object, for example the image found can be corresponding with the mark of specified object.It can also will specify
The image of object inputs predetermined image recognition model, and the specified object for including in image is identified by the model.Figure
As identification model can export mark or the title etc. of specified object.The letter that above-mentioned electronic equipment can export model later
Breath is used as relevant information, and the acquisition of information that can also be exported by model specifies the relevant information of object.Here image recognition
Model can be neural network model or specified algorithm etc..
In practice, the relevant information for specifying object can be searched in the database of the relevant information pre-established.Phase
It may include relevant information to close in the database of information, image, the mark that the relevant information in the database can be with specified object
Know corresponding at least one of title.In addition it is also possible to which mark or title based on specified object are searched in internet
Rope, to obtain relevant information.
Step 204, the relevant information of the analysis result based on voice request and specified object generates voice-response information.
In the present embodiment, the related letter of analysis result and specified object of the above-mentioned electronic equipment based on voice request
Breath generates voice-response information.Voice-response information is the information responded to the voice that user sends out, may finally by with
The terminal device that user carries out interactive voice is exported in the form of audio.
Various ways may be used and generate voice-response information.Analysis result, relevant information and voice can be obtained in advance
The mapping table of response message.After obtaining analysis result and relevant information, parsing knot is searched in mapping table
Voice-response information corresponding to fruit and relevant information.The voice that analysis result and relevant information input have obtained can also be rung
Information model is answered, the voice-response information exported from the model is obtained.Voice response model can characterize analysis result, related letter
The correspondence of breath and voice-response information.Here voice response model can be neural network model or specified algorithm etc.
Deng.
Step 205, voice-response information is exported.
In the present embodiment, above-mentioned electronic equipment exports the voice-response information after generating voice-response information.Specifically
Ground, if above-mentioned electronic equipment is terminal device, above-mentioned electronic equipment can be exported by way of playing voice.Such as
The above-mentioned electronic equipment of fruit is server, then voice-response information can be sent to terminal device by above-mentioned electronic equipment, to complete
Output.
It is one of the application scenarios of the method according to the present embodiment for handling voice request with continued reference to Fig. 3, Fig. 3
Schematic diagram.In the application scenarios of Fig. 3, what electronic equipment 301 was sent in response to receiving user or other electronic equipments 302
Voice request 303, parses voice request;Include obtaining to specify object in response to the analysis result 304 to voice request
Relevant information intention, obtain the image 305 for including specified object;The relevant information 306 for specifying object is determined based on image;
The relevant information 306 of analysis result 304 and specified object based on voice request generates voice-response information 307;Export voice
Response message 307.
The method that above-described embodiment of the application provides includes the image for specifying object by obtaining, and can determine specified pair
The relevant information of elephant, and then voice-response information is more accurately exported to be interacted with user.
With further reference to Fig. 4, it illustrates the flows 400 of another embodiment of the method for handling voice request.
This is used to handle the flow 400 of the method for voice request, includes the following steps:
Step 401, in response to receiving voice request, voice request is converted into text sentence.
In the present embodiment, it is (such as shown in FIG. 1 to run electronic equipment thereon for the method for handling voice request
Server) it can be responded after receiving voice request:Voice request is converted into text sentence.
Step 402, determine whether text sentence includes preset keyword.
In the present embodiment, above-mentioned electronic equipment determines whether above-mentioned text sentence includes preset keyword.Here
Keyword is for determining whether user has the intention for obtaining the relevant information for specifying object, that is, in order to determine analysis result
Whether include the intention for obtaining the relevant information for specifying object.For example, preset keyword can be " identification ", " identification of taking pictures "
Etc..
It may be used and searched whether comprising default in keyword match technique text sentence made of voice request conversion
Keyword.Optionally, when carrying out Keywords matching, the mode of fuzzy matching may be used, judge the word in text sentence
It is whether close with preset keywords semantics, if so, successful match can be determined, that is, determine whether text sentence includes default
Keyword.
Step 403, in response to determining that text sentence includes predetermined keyword, determine that analysis result includes obtaining to specify object
Relevant information intention.
In the present embodiment, above-mentioned electronic equipment is in response to determining that text sentence includes predetermined keyword, it may be determined that solution
Analysis result includes obtaining the intention for the relevant information for specifying object.
Step 404, include the intention for obtaining the relevant information for specifying object, hair in response to the analysis result to voice request
The instruction for sending acquisition to include the image of specified object.
In the present embodiment, include the intention for obtaining the relevant information for specifying object in the analysis result to voice request
In the case of, above-mentioned electronic equipment can send the instruction of image of the acquisition comprising specified object.Acquisition includes the figure of specified object
The instruction of picture carries out Image Acquisition to indicating equipment to specified object, generates the image for including specified object.Specifically, above-mentioned
Electronic equipment can be server, which sends above-metioned instruction to the terminal device including filming apparatus, so that terminal is set
It is standby that image is acquired after receiving above-metioned instruction.In addition, above-mentioned electronic equipment can also be terminal device, the terminal device is to bat
It takes the photograph device and sends above-metioned instruction, so that filming apparatus acquires image after receiving above-metioned instruction.
Step 405, the image for including specified object of acquisition is obtained.
In the present embodiment, above-mentioned electronic equipment obtains the collected image for including specified object.If above-mentioned electronics
Equipment is server, then receiving terminal apparatus the image collected.If above-mentioned electronic equipment is terminal device, shooting is received
Device the image collected.
Specifically, various ways may be used and obtain image, for example specified object can be placed in filming apparatus by user
Before camera lens, in order to which filming apparatus acquires image.
In some optional realization methods of the present embodiment, it can be acquired from rotating pick-up device by human bioequivalence
The image for including user is determined in each image, obtains the image comprising user as the image for including specified object.
Usual user with the equipment with voice service function when interacting, user and the equipment with voice service function
Between keep smaller distance.In this way, filming apparatus can obtain the image of multiple positions by rotating, looked for by human bioequivalence
To user, and then acquire the image for including user.Later above-mentioned electronic equipment can using this include user image as comprising
The image of specified object can also send shooting instruction, so that shooting fills after determining the image comprising user to filming apparatus
It sets to the position amplification areas imaging shooting including user.The image that filming apparatus returns is obtained later.
Step 406, the relevant information for specifying object is determined based on image.
In the present embodiment, above-mentioned electronic equipment determines the relevant information of specified object based on acquired image.The figure
Seem the image of specified object, includes specified object in the picture, so, specified object can be determined based on the image
Relevant information.
Various ways may be used and determine the relevant information for specifying object based on image.It can will be in the image and database
Image be compared, search the image for being higher than threshold value with the similarity of the image.The image of specified object can also be inputted
Predetermined image recognition model identifies the specified object for including in image by the model.Image recognition model can be with
Mark or title of object etc. are specified in output.Specified pair of the acquisition of information that above-mentioned electronic equipment can be exported by model later
The relevant information of elephant.Here image recognition model can be neural network model or specified algorithm etc..Furthermore, it is possible to logical
It crosses database or internet scans for relevant information.
Step 407, the relevant information of the analysis result based on voice request and specified object generates voice-response information.
In the present embodiment, the related letter of analysis result and specified object of the above-mentioned electronic equipment based on voice request
Breath generates voice-response information.Voice-response information is the information responded to the voice that user sends out, may finally by with
The terminal device that user carries out interactive voice is exported in the form of audio.
Various ways may be used and generate voice-response information.Analysis result and voice response template can be obtained in advance
Mapping table.After obtaining analysis result, the voice response corresponding to analysis result is searched in mapping table
Then the relevant information of the specified object found is generated voice response letter by template with corresponding voice response form assembly
Breath.Analysis result and relevant information can also be inputted the voice-response information model that obtained, obtain exporting from the model
Voice-response information.Voice response model can characterize the correspondence of analysis result, relevant information and voice-response information.
Step 408, voice-response information is exported.
In the present embodiment, above-mentioned electronic equipment exports the voice-response information after generating voice-response information.Specifically
Ground, if above-mentioned electronic equipment is terminal device, above-mentioned electronic equipment can be exported by way of playing voice.Such as
The above-mentioned electronic equipment of fruit is server, then voice-response information can be sent to terminal device by above-mentioned electronic equipment, to complete
Output.
The embodiment of the present application is fast and accurately parsed by preset keyword, to voice request, to generate more
Add accurate voice-response information.Meanwhile improving the treatment effeciency of voice request.
With further reference to Fig. 5, it illustrates the flows 500 of another embodiment of the method for handling voice request.
This is used to handle the flow 500 of the method for voice request, includes the following steps:
Step 501, in response to receiving voice request, voice request is parsed.
In the present embodiment, it is (such as shown in FIG. 1 to run electronic equipment thereon for the method for handling voice request
Server) it can be responded after receiving voice request:Voice request is parsed.Voice request is with voice
Form send request.
In practice, above-mentioned electronic equipment can be terminal device, then above-mentioned electronic equipment can receive user to say
The voice request that the forms such as words are sent out.Above-mentioned electronic equipment can also be server, then above-mentioned electronic equipment can receive end
End equipment voice request transmitted after the voice request for getting user.
The technologies such as speech recognition may be used to parse voice request, it is specific interior indicated by voice request to obtain
Hold, particular content here is the information to be expressed of user.Voice request can be switched to word by speech recognition technology, and to text
Word is identified.
Step 502, include the intention for obtaining the relevant information for specifying object, hair in response to the analysis result to voice request
The instruction for sending acquisition to include the image of specified object.
In the present embodiment, include the intention for obtaining the relevant information for specifying object in the analysis result to voice request
In the case of, above-mentioned electronic equipment can send the instruction of image of the acquisition comprising specified object.Acquisition includes the figure of specified object
The instruction of picture carries out Image Acquisition to indicating equipment to specified object, generates the image for including specified object.Specifically, above-mentioned
Electronic equipment can be server, which sends above-metioned instruction to the terminal device including filming apparatus, so that terminal is set
It is standby that image is acquired after receiving above-metioned instruction.In addition, above-mentioned electronic equipment can also be terminal device, the terminal device is to bat
It takes the photograph device and sends above-metioned instruction, so that filming apparatus acquires image after receiving above-metioned instruction.
Step 503, the location information for the user for sending out voice request is determined.
In the present embodiment, above-mentioned electronic equipment determines the location information for the user for sending out voice request.Location information can
To be the corresponding at least one coordinate information in position, for example (x, y, z) can be used to indicate.It can also be the electricity for acquiring image
The relative position information of sub- equipment and user, for example, user can be at 11 o'clock on the same horizontal line of the electronic equipment
Direction.
In practice, user can send out voice request to terminal device, and terminal device is receiving what user sent out
After voice request, voice request can also be sent out to server.
In some optional realization methods of the present embodiment, determine the user's for sending out voice request by auditory localization
Location information.
In the present embodiment, user can send out voice, because of user during carrying out interactive voice with terminal device
The voice request sent out is voice, and such terminal device can carry out auditory localization according to the voice request that user sends out, with true
The location information of the fixed user.
Auditory localization is alternatively referred to as sound positioning.May be used steerable beam formation technology based on peak power output,
High-Resolution Spectral Estimation technology or sodar digital technology etc..
Step 504, the image for including specified object that the location information based on user is acquired is obtained.
In the present embodiment, above-mentioned electronic equipment obtains the image for including specified object of acquisition, and image here is base
It is acquired in the location information of user.Specifically, it is installed on terminal device or can be right independently of the filming apparatus of terminal device
User location indicated by the location information of user is shot, to obtain including the image of specified object.It can also be in determination
After user location, areas imaging shooting is amplified to user location.Around can also be to user location and user location
Position shot.
Step 505, the relevant information for specifying object is determined based on image.
In the present embodiment, above-mentioned electronic equipment determines the relevant information of specified object based on acquired image.The figure
Seem the image of specified object, includes specified object in the picture, so, specified object can be determined based on the image
Relevant information.
Various ways may be used and determine the relevant information for specifying object based on image.It can will be in the image and database
Image be compared, search the image for being higher than threshold value with the similarity of the image.The image of specified object can also be inputted
Predetermined image recognition model identifies the specified object for including in image by the model.Image recognition model can be with
Mark or title of object etc. are specified in output.Specified pair of the acquisition of information that above-mentioned electronic equipment can be exported by model later
The relevant information of elephant.Here image recognition model can be neural network model or specified algorithm etc..
Step 506, the relevant information of the analysis result based on voice request and specified object generates voice-response information.
In the present embodiment, the related letter of analysis result and specified object of the above-mentioned electronic equipment based on voice request
Breath generates voice-response information.Voice-response information is the information responded to the voice that user sends out, may finally by with
The terminal device that user carries out interactive voice is exported in the form of audio.
Various ways may be used and generate voice-response information.Analysis result, relevant information and voice can be obtained in advance
The mapping table of response message.After obtaining analysis result and relevant information, parsing knot is searched in mapping table
Voice-response information corresponding to fruit and relevant information.The voice that analysis result and relevant information input have obtained can also be rung
Information model is answered, the voice-response information exported from the model is obtained.Voice response model can characterize analysis result, related letter
The correspondence of breath and voice-response information.
Step 507, voice-response information is exported.
In the present embodiment, above-mentioned electronic equipment exports the voice-response information after generating voice-response information.Specifically
Ground, if above-mentioned electronic equipment is terminal device, above-mentioned electronic equipment can be exported by way of playing voice.Such as
The above-mentioned electronic equipment of fruit is server, then voice-response information can be sent to terminal device by above-mentioned electronic equipment, to complete
Output.
The present embodiment is by determining the location information of user, even if specified object is not present in the camera lens of filming apparatus
In the case of, it can also get the image of specified object, it is ensured that the generation and output of voice-response information.
With further reference to Fig. 6, it illustrates the flows 600 of another embodiment of the method for handling voice request.
This is used to handle the flow 600 of the method for voice request, includes the following steps:
Step 601, in response to receiving voice request, voice request is parsed.
In the present embodiment, it is (such as shown in FIG. 1 to run electronic equipment thereon for the method for handling voice request
Server) it can be responded after receiving voice request:Voice request is parsed.Voice request is with voice
Form send request.
In practice, above-mentioned electronic equipment can be terminal device, then above-mentioned electronic equipment can receive user to say
The voice request that the forms such as words are sent out.Above-mentioned electronic equipment can also be server, then above-mentioned electronic equipment can receive end
End equipment voice request transmitted after the voice request for getting user.
The technologies such as speech recognition may be used to parse voice request, it is specific interior indicated by voice request to obtain
Hold, particular content here is the information to be expressed of user.Voice request can be switched to word by speech recognition technology, and to text
Word is identified.
In specific application scenarios, user can a flower at one's side with finger and be asked " what flower this is ".
Step 602, include the intention for obtaining the relevant information for specifying object in response to the analysis result to voice request, obtain
Take the image for including specified object.
In the present embodiment, include the feelings for the intention for obtaining the relevant information for specifying object in the analysis result of voice request
Under condition, above-mentioned electronic equipment obtains the image for including specified object.Specified object can be appointed arbitrary life entity or
Article.The relevant information of specified object is and the relevant various information of specified object.Relevant information can be each of specified object
Kind information, for example, title, classification.Analysis result includes the intention for obtaining the relevant information for specifying object, shows that user sends out language
Sound request is the relevant information in order to allow the terminal device interacted that can obtain specified object, and such terminal device can root
Voice response is made according to acquired relevant information.Include the intention of user in analysis result, it is intended that may include obtaining to refer to
Determine the relevant information of object, can also include other intentions.
In order to determine that the relevant information of specified object, above-mentioned electronic equipment obtain the image for including specified object.Herein,
If above-mentioned electronic equipment is server, above-mentioned electronic equipment can obtain the image for specifying object from terminal device.If
Above-mentioned electronic equipment is terminal device, which can be the image of terminal device shooting, can also be that terminal device receives use
It is that family uploads or receive image captured by other electronic equipments.
Step 603, the type keyword for including indicated specified object in response to determining the analysis result of voice request,
The matched image of image institute of specified object is determined whether there is in the image data base of type indicated by type keyword.
In the present embodiment, above-mentioned electronic equipment includes indicated specified object in the analysis result for determining voice request
Type keyword after, respond:It is determined whether there is in the image data base of type indicated by type keyword
The matched image of image institute of specified object.
Specifically, each type corresponds to an image data base, and the image in a type of image data base belongs to
Same type.Image is searched in the corresponding image data base of type belonging to specified object, the work of lookup can be reduced
Amount improves search speed.
Other than the intention including obtaining the relevant information for specifying object, analysis result can also include that type is crucial
Word.For example, " flower ", " animal " etc..For example, user is with terminal device when interacting, and can inquire that " this is
What flower ", " flower " here is exactly type keyword.
Step 604, however, it is determined that there is the figure of specified object in the image data base of the type indicated by type keyword
As the matched image of institute, the information corresponding to the matched image of institute is determined as to specify the relevant information of object.
In the present embodiment, above-mentioned electronic equipment is if it is determined that in the image data base of type indicated by type keyword
There are the matched image of the image of specified object institute, then the information corresponding to image that will match to is determined as the phase of specified object
Close information.
Two images similarity between can referring to two images that matches is higher, for example similarity is higher than threshold value.Specifically
Ground determines whether image matches may be used and knows diagram technology, OCR (Optical Character Recognition, optics
Character recognition) etc..
In practice, the relevant information for specifying object can be searched in the database of the relevant information pre-established.Phase
It may include relevant information to close in the database of information, the relevant information in the database and the image of specified object, mark, name
At least one of title is corresponding.In addition it is also possible to after determining the matched image of image institute of specified object, determine specified
The mark or title of object are scanned for based on mark or title in internet.
Step 605, however, it is determined that there is no specified objects in the image data base of the type indicated by type keyword
The matched image of image institute searches the matched image of image institute for specifying object, by the matched figure of institute in total image data base
As corresponding information is determined as the relevant information of specified object.
In the present embodiment, above-mentioned electronic equipment is if it is determined that the corresponding types of image of type indicated by type keyword
There is no the matched image of the image of specified object institute in database, then the image for specifying object is searched in total image data base
The matched image of institute.The information corresponding to image that will match to later is determined as the relevant information of specified object.
If matched image can not be found in the image data base of the type indicated by type keyword, storing
Image is searched in the total image data base for having great amount of images.Here total image data base includes the figure of various types of objects
Picture is different from the image data base of the type indicated by type keyword.
Step 606, in response to determining that the analysis result of voice request does not include the type key of indicated specified object
Word searches the matched image of image institute for specifying object in total image data base, by information corresponding to matched image
It is determined as the relevant information of specified object.
In the present embodiment, above-mentioned electronic equipment does not include indicated specified pair in the analysis result for determining voice request
After the type keyword of elephant, then respond:The matched image of image institute for specifying object is searched in total image data base,
By information corresponding to matched image be determined as the relevant information of specified object.That is, if can not determine specified object
Affiliated type then searches the image with specified match objects directly in total image data base.Here total image data
Library includes the image of various types of objects, is different from the image data base of the type indicated by type keyword.
For example, user can a flower at one's side with finger and ask " what flower this is ", the terminal interacted with user
Equipment can respond that " this is not colored, this is Ha Shiqi ".
Step 607, the relevant information of the analysis result based on voice request and specified object generates voice-response information.
In the present embodiment, the related letter of analysis result and specified object of the above-mentioned electronic equipment based on voice request
Breath generates voice-response information.Voice-response information is the information responded to the voice that user sends out, may finally by with
The terminal device that user carries out interactive voice is exported in the form of audio.
Various ways may be used and generate voice-response information.Analysis result, relevant information and voice can be obtained in advance
The mapping table of response message.After obtaining analysis result and relevant information, parsing knot is searched in mapping table
Voice-response information corresponding to fruit and relevant information.The voice that analysis result and relevant information input have obtained can also be rung
Information model is answered, the voice-response information exported from the model is obtained.Voice response model can characterize analysis result, related letter
The correspondence of breath and voice-response information.Here voice response model can be neural network model or specified algorithm etc.
Deng.
Step 608, voice-response information is exported.
In the present embodiment, above-mentioned electronic equipment exports the voice-response information after generating voice-response information.Specifically
Ground, if above-mentioned electronic equipment is terminal device, above-mentioned electronic equipment can be exported by way of playing voice.Such as
The above-mentioned electronic equipment of fruit is server, then voice-response information can be sent to terminal device by above-mentioned electronic equipment, to complete
Output.
The present embodiment can pass through the type of specified object range smaller in this way in corresponding types of image data library
It is inside matched, matching speed can be accelerated, and then shorten the time of processing voice request.
With further reference to Fig. 7, as the realization to method shown in above-mentioned each figure, this application provides one kind for handling language
One embodiment of the device of sound request, the device embodiment is corresponding with embodiment of the method shown in Fig. 2, which specifically may be used
To be applied in various electronic equipments.
As shown in fig. 7, the device 700 for handling voice request of the present embodiment includes:Resolution unit 701 obtains list
Member 702, determination unit 703, generation unit 704 and output unit 705.Wherein, resolution unit 701 are configured in response to connecing
Voice request is received, voice request is parsed;Acquiring unit 702 is configured in response to the parsing knot to voice request
Fruit includes the intention for obtaining the relevant information for specifying object, obtains the image for including specified object;Determination unit 703, configuration are used
In the relevant information for determining specified object based on image;Generation unit 704, be configured to analysis result based on voice request and
The relevant information of specified object, generates voice-response information;Output unit 705 is configured to output voice-response information.
In the present embodiment, resolution unit 701 can respond after receiving voice request:To voice request
It is parsed.Voice request is the request sent in the form of speech.In practice, resolution unit 701 can be terminal device,
So resolution unit 701 can receive user to speak etc. in the form of the voice request that sends out.Resolution unit 701 can also be service
Device, then resolution unit 701 can be with receiving terminal apparatus voice request transmitted after the voice request for getting user.
In the present embodiment, include the feelings for the intention for obtaining the relevant information for specifying object in the analysis result of voice request
Under condition, acquiring unit 702 obtains the image for including specified object.Specified object can be appointed arbitrary life entity or object
Product.The relevant information of specified object is and the relevant various information of specified object, for example, specified object title, classification, correlation
Knowledge, related commentary etc..Analysis result includes the intention for obtaining the relevant information for specifying object, shows that user sends out voice request
It is the relevant information in order to allow the terminal device interacted that can obtain specified object, such terminal device can be according to being obtained
The relevant information taken makes voice response.Include the intention of user in analysis result, it is intended that may include obtaining to specify object
Relevant information, can also include other intention.
In the present embodiment, determination unit 703 determines the relevant information of specified object based on acquired image.The figure
Seem the image of specified object, includes specified object in the picture, so, specified object can be determined based on the image
Relevant information.
In the present embodiment, the related letter of analysis result and specified object of the generation unit 704 based on voice request
Breath generates voice-response information.Voice-response information is the information responded to the voice that user sends out, may finally by with
The terminal device that user carries out interactive voice is exported in the form of audio.
In the present embodiment, output unit 705 exports the voice-response information after generating voice-response information.Specifically
Ground, if output unit 705 is terminal device, output unit 705 can be exported by way of playing voice.If
Output unit 705 is server, then voice-response information can be sent to terminal device by output unit 705, to complete to export.
In some optional realization methods of the present embodiment, acquiring unit is further configured to obtain as follows
Take the image for including specified object:Include obtaining the relevant information for specifying object in response to the intents result to voice request
Intention, send the instruction of image of the acquisition comprising specified object;Obtain the image for including specified object of acquisition.
In some optional realization methods of the present embodiment, acquiring unit is further configured to obtain as follows
Take the image for including specified object of acquisition:Determine the location information for the user for sending out voice request;Obtain the position based on user
The image for including specified object that confidence breath is acquired.
In some optional realization methods of the present embodiment, acquiring unit is further configured to as follows really
Surely the location information of the user of voice request is sent out:The location information for the user for sending out voice request is determined by auditory localization.
In some optional realization methods of the present embodiment, resolution unit is further configured to right as follows
Voice request carries out intents:Voice request is converted into text sentence;Determine whether text sentence includes preset key
Word;In response to determining that text sentence includes predetermined keyword, determine that intents result includes obtaining the related letter for specifying object
The intention of breath.
In some optional realization methods of the present embodiment, determination unit is further configured to as follows really
Surely the relevant information of specified object:In response to determining that the analysis result of voice request includes that the type of indicated specified object is closed
Keyword, the image institute that specified object is determined whether there is in the image data base of the type indicated by type keyword are matched
Image;If it is determined that there is the matched figure of image institute of specified object in the image data base of type indicated by type keyword
Picture, by information corresponding to matched image be determined as the relevant information of specified object;If it is determined that signified in type keyword
There is no the matched image of the image of specified object institute in the image data base for the type shown, searches and refer in total image data base
Information corresponding to the matched image of institute is determined as specifying the relevant information of object by the matched image of image institute for determining object;
In response to determining that the analysis result of voice request does not include the type keyword of indicated specified object, in total image data base
It is middle to search the matched image of image institute for specifying object, the information corresponding to the matched image of institute is determined as to specify the phase of object
Close information.
Below with reference to Fig. 8, it illustrates the computer systems 800 suitable for the electronic equipment for realizing the embodiment of the present application
Structural schematic diagram.Electronic equipment shown in Fig. 8 is only an example, to the function of the embodiment of the present application and should not use model
Shroud carrys out any restrictions.
As shown in figure 8, computer system 800 includes central processing unit (CPU) 801, it can be read-only according to being stored in
Program in memory (ROM) 802 or be loaded into the program in random access storage device (RAM) 803 from storage section 808 and
Execute various actions appropriate and processing.In RAM 803, also it is stored with system 800 and operates required various programs and data.
CPU 801, ROM 802 and RAM 803 are connected with each other by bus 804.Input/output (I/O) interface 805 is also connected to always
Line 804.
It is connected to I/O interfaces 805 with lower component:Importation 806 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 807 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.;Storage section 808 including hard disk etc.;
And the communications portion 809 of the network interface card including LAN card, modem etc..Communications portion 809 via such as because
The network of spy's net executes communication process.Driver 810 is also according to needing to be connected to I/O interfaces 805.Detachable media 811, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 810, as needed in order to be read from thereon
Computer program be mounted into storage section 808 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes the program code for method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed by communications portion 809 from network, and/or from detachable media
811 are mounted.When the computer program is executed by central processing unit (CPU) 801, limited in execution the present processes
Above-mentioned function.It should be noted that the computer-readable medium of the application can be computer-readable signal media or calculating
Machine readable storage medium storing program for executing either the two arbitrarily combines.Computer readable storage medium for example can be --- but it is unlimited
In --- electricity, system, device or the device of magnetic, optical, electromagnetic, infrared ray or semiconductor, or the arbitrary above combination.It calculates
The more specific example of machine readable storage medium storing program for executing can include but is not limited to:Being electrically connected, be portable with one or more conducting wires
Formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory
(EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or
The above-mentioned any appropriate combination of person.In this application, can be any include computer readable storage medium or storage program
Tangible medium, the program can be commanded execution system, device either device use or it is in connection.And in this Shen
Please in, computer-readable signal media may include in a base band or as the data-signal that a carrier wave part is propagated,
In carry computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including but not limited to
Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable
Any computer-readable medium other than storage medium, the computer-readable medium can send, propagate or transmit for by
Instruction execution system, device either device use or program in connection.The journey for including on computer-readable medium
Sequence code can transmit with any suitable medium, including but not limited to:Wirelessly, electric wire, optical cable, RF etc. or above-mentioned
Any appropriate combination.
Flow chart in attached drawing and block diagram, it is illustrated that according to the system of the various embodiments of the application, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part for a part for one module, program segment, or code of table, the module, program segment, or code includes one or more uses
The executable instruction of the logic function as defined in realization.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also be executed in the opposite order sometimes, this is depended on the functions involved.Also it to note
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard
The mode of part is realized.Described unit can also be arranged in the processor, for example, can be described as:A kind of processor packet
Include resolution unit, acquiring unit, determination unit, generation unit and output unit.Wherein, the title of these units is in certain situation
Under do not constitute restriction to the unit itself, for example, resolution unit is also described as " asking in response to receiving voice
It asks, the unit that voice request is parsed ".
As on the other hand, present invention also provides a kind of computer-readable medium, which can be
Included in device described in above-described embodiment;Can also be individualism, and without be incorporated the device in.Above-mentioned calculating
Machine readable medium carries one or more program, when said one or multiple programs are executed by the device so that should
Device:In response to receiving voice request, voice request is parsed;Include obtaining in response to the analysis result to voice request
Fetching determines the intention of the relevant information of object, obtains the image for including specified object;The correlation for specifying object is determined based on image
Information;The relevant information of analysis result and specified object based on voice request generates voice-response information;Export voice response
Information.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.People in the art
Member should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature
Other technical solutions of arbitrary combination and formation.Such as features described above has similar work(with (but not limited to) disclosed herein
Can technical characteristic replaced mutually and the technical solution that is formed.
Claims (14)
1. a kind of method for handling voice request, including:
In response to receiving voice request, the voice request is parsed;
Include the intention for obtaining the relevant information for specifying object in response to the analysis result to the voice request, it includes institute to obtain
State the image of specified object;
The relevant information of the specified object is determined based on described image;
The relevant information of analysis result and the specified object based on the voice request generates voice-response information;
Export the voice-response information.
2. according to the method described in claim 1, wherein, the intents result in response to the voice request includes
The intention for the relevant information for specifying object is obtained, the image for including the specified object is obtained, including:
Include obtaining the intention for the relevant information for specifying object in response to the intents result to the voice request, transmission is adopted
The instruction of image of the collection comprising the specified object;
Obtain the image for including the specified object of acquisition.
3. according to the method described in claim 2, wherein, the image for including the specified object for obtaining acquisition,
Including:
Determine the location information for the user for sending out voice request;
Obtain the image for including the specified object that the location information based on the user is acquired.
4. according to the method described in claim 3, wherein, the determination sends out the location information of the user of the voice request,
Including:
The location information for the user for sending out voice request is determined by auditory localization.
5. it is described that intents are carried out to the voice request according to the method described in claim 1, wherein, including:
The voice request is converted into text sentence;
Determine whether the text sentence includes preset keyword;
Include predetermined keyword in response to the determination text sentence, determines that the intents result includes obtaining to specify object
Relevant information intention.
6. described to determine that the related of the specified object is believed based on described image according to the method described in claim 1, wherein
Breath, including:
The type keyword for including indicated specified object in response to the analysis result of the determination voice request, in the class
The matched image of image institute of the specified object is determined whether there is in the image data base of type indicated by type keyword;
If it is determined that in the image data base of type indicated by the type keyword, there are the images of the specified object to be matched
Image, by information corresponding to matched image be determined as the relevant information of the specified object;If it is determined that in the class
There is no the matched images of the image of specified object institute in the image data base of type indicated by type keyword, in total figure
The matched image of image institute as searching the specified object in database, by information corresponding to matched image be determined as
The relevant information of the specified object;
The type keyword for not including indicated specified object in response to the analysis result of the determination voice request, in total figure
The matched image of image institute as searching the specified object in database, by information corresponding to matched image be determined as
The relevant information of the specified object.
7. a kind of device for handling voice request, including:
Resolution unit is configured to, in response to receiving voice request, parse the voice request;
Acquiring unit is configured in response to the analysis result to the voice request include obtaining the relevant information for specifying object
Intention, obtain and include the image of the specified object;
Determination unit is configured to determine the relevant information of the specified object based on described image;
Generation unit is configured to the relevant information of analysis result and the specified object based on the voice request, generates
Voice-response information;
Output unit is configured to export the voice-response information.
8. device according to claim 7, wherein the acquiring unit is further configured to obtain as follows
Include the image of the specified object:
Include obtaining the intention for the relevant information for specifying object in response to the intents result to the voice request, transmission is adopted
The instruction of image of the collection comprising the specified object;
Obtain the image for including the specified object of acquisition.
9. device according to claim 8, wherein the acquiring unit is further configured to obtain as follows
The image for including the specified object of acquisition:
Determine the location information for the user for sending out voice request;
Obtain the image for including the specified object that the location information based on the user is acquired.
10. device according to claim 9, wherein the acquiring unit is further configured to as follows really
Surely the location information of the user of voice request is sent out:
The location information for the user for sending out voice request is determined by auditory localization.
11. device according to claim 7, wherein the resolution unit is further configured to right as follows
The voice request carries out intents:
The voice request is converted into text sentence;
Determine whether the text sentence includes preset keyword;
Include predetermined keyword in response to the determination text sentence, determines that the intents result includes obtaining to specify object
Relevant information intention.
12. device according to claim 7, wherein the determination unit is further configured to as follows really
The relevant information of the fixed specified object:
The type keyword for including indicated specified object in response to the analysis result of the determination voice request, in the class
The matched image of image institute of the specified object is determined whether there is in the image data base of type indicated by type keyword;
If it is determined that in the image data base of type indicated by the type keyword, there are the images of the specified object to be matched
Image, by information corresponding to matched image be determined as the relevant information of the specified object;If it is determined that in the class
There is no the matched images of the image of specified object institute in the image data base of type indicated by type keyword, in total figure
The matched image of image institute as searching the specified object in database, by information corresponding to matched image be determined as
The relevant information of the specified object;
The type keyword for not including indicated specified object in response to the analysis result of the determination voice request, in total figure
The matched image of image institute as searching the specified object in database, by information corresponding to matched image be determined as
The relevant information of the specified object.
13. a kind of electronic equipment, including:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors so that one or more of processors are real
The now method as described in any in claim 1-6.
14. a kind of computer readable storage medium, is stored thereon with computer program, wherein when the program is executed by processor
Realize the method as described in any in claim 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810124300.9A CN108334498A (en) | 2018-02-07 | 2018-02-07 | Method and apparatus for handling voice request |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810124300.9A CN108334498A (en) | 2018-02-07 | 2018-02-07 | Method and apparatus for handling voice request |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108334498A true CN108334498A (en) | 2018-07-27 |
Family
ID=62927086
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810124300.9A Pending CN108334498A (en) | 2018-02-07 | 2018-02-07 | Method and apparatus for handling voice request |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108334498A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109697290A (en) * | 2018-12-29 | 2019-04-30 | 咪咕数字传媒有限公司 | A kind of information processing method, equipment and computer storage medium |
CN109800301A (en) * | 2019-01-23 | 2019-05-24 | 广东小天才科技有限公司 | A kind of method for digging and facility for study of weakness knowledge point |
CN110111788A (en) * | 2019-05-06 | 2019-08-09 | 百度在线网络技术(北京)有限公司 | The method and apparatus of interactive voice, terminal, computer-readable medium |
CN110689891A (en) * | 2019-11-20 | 2020-01-14 | 广东奥园奥买家电子商务有限公司 | Voice interaction method and device based on public display device |
CN111899582A (en) * | 2020-07-29 | 2020-11-06 | 联想(北京)有限公司 | Information processing method and device for network teaching and electronic equipment |
CN112037763A (en) * | 2020-08-27 | 2020-12-04 | 腾讯科技(深圳)有限公司 | Service testing method and device based on artificial intelligence |
WO2021068189A1 (en) * | 2019-10-11 | 2021-04-15 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for image generation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104866308A (en) * | 2015-05-18 | 2015-08-26 | 百度在线网络技术(北京)有限公司 | Scenario image generation method and apparatus |
CN107145509A (en) * | 2017-03-28 | 2017-09-08 | 深圳市元征科技股份有限公司 | A kind of information search method and its equipment |
CN107491284A (en) * | 2016-06-10 | 2017-12-19 | 苹果公司 | The digital assistants of automation state report are provided |
CN107590252A (en) * | 2017-09-19 | 2018-01-16 | 百度在线网络技术(北京)有限公司 | Method and device for information exchange |
-
2018
- 2018-02-07 CN CN201810124300.9A patent/CN108334498A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104866308A (en) * | 2015-05-18 | 2015-08-26 | 百度在线网络技术(北京)有限公司 | Scenario image generation method and apparatus |
CN107491284A (en) * | 2016-06-10 | 2017-12-19 | 苹果公司 | The digital assistants of automation state report are provided |
CN107145509A (en) * | 2017-03-28 | 2017-09-08 | 深圳市元征科技股份有限公司 | A kind of information search method and its equipment |
CN107590252A (en) * | 2017-09-19 | 2018-01-16 | 百度在线网络技术(北京)有限公司 | Method and device for information exchange |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109697290A (en) * | 2018-12-29 | 2019-04-30 | 咪咕数字传媒有限公司 | A kind of information processing method, equipment and computer storage medium |
CN109800301A (en) * | 2019-01-23 | 2019-05-24 | 广东小天才科技有限公司 | A kind of method for digging and facility for study of weakness knowledge point |
CN109800301B (en) * | 2019-01-23 | 2020-12-01 | 广东小天才科技有限公司 | Weak knowledge point mining method and learning equipment |
CN110111788A (en) * | 2019-05-06 | 2019-08-09 | 百度在线网络技术(北京)有限公司 | The method and apparatus of interactive voice, terminal, computer-readable medium |
CN110111788B (en) * | 2019-05-06 | 2022-02-08 | 阿波罗智联(北京)科技有限公司 | Voice interaction method and device, terminal and computer readable medium |
WO2021068189A1 (en) * | 2019-10-11 | 2021-04-15 | Beijing Didi Infinity Technology And Development Co., Ltd. | Systems and methods for image generation |
CN110689891A (en) * | 2019-11-20 | 2020-01-14 | 广东奥园奥买家电子商务有限公司 | Voice interaction method and device based on public display device |
CN111899582A (en) * | 2020-07-29 | 2020-11-06 | 联想(北京)有限公司 | Information processing method and device for network teaching and electronic equipment |
CN112037763A (en) * | 2020-08-27 | 2020-12-04 | 腾讯科技(深圳)有限公司 | Service testing method and device based on artificial intelligence |
CN112037763B (en) * | 2020-08-27 | 2023-10-13 | 腾讯科技(深圳)有限公司 | Service testing method and device based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108334498A (en) | Method and apparatus for handling voice request | |
CN109726624B (en) | Identity authentication method, terminal device and computer readable storage medium | |
CN108898185A (en) | Method and apparatus for generating image recognition model | |
CN109086719A (en) | Method and apparatus for output data | |
CN109034069A (en) | Method and apparatus for generating information | |
CN108121800A (en) | Information generating method and device based on artificial intelligence | |
CN108595628A (en) | Method and apparatus for pushed information | |
CN108986790A (en) | The method and apparatus of voice recognition of contact | |
CN107832720B (en) | Information processing method and device based on artificial intelligence | |
CN109299477A (en) | Method and apparatus for generating text header | |
CN109934242A (en) | Image identification method and device | |
CN110046254A (en) | Method and apparatus for generating model | |
CN109635094A (en) | Method and apparatus for generating answer | |
CN108335390A (en) | Method and apparatus for handling information | |
CN108521516A (en) | Control method and device for terminal device | |
CN107862071A (en) | The method and apparatus for generating minutes | |
CN110110666A (en) | Object detection method and device | |
CN108959087A (en) | test method and device | |
CN109241721A (en) | Method and apparatus for pushed information | |
CN108133197A (en) | For generating the method and apparatus of information | |
CN108446658A (en) | The method and apparatus of facial image for identification | |
CN108171208A (en) | Information acquisition method and device | |
CN109543068A (en) | Method and apparatus for generating the comment information of video | |
CN110232920A (en) | Method of speech processing and device | |
CN109829431A (en) | Method and apparatus for generating information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210507 Address after: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing Applicant after: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd. Applicant after: Shanghai Xiaodu Technology Co.,Ltd. Address before: 100085 Baidu Building, 10 Shangdi Tenth Street, Haidian District, Beijing Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180727 |