CN110119461B

CN110119461B - Query information processing method and device

Info

Publication number: CN110119461B
Application number: CN201810071243.2A
Authority: CN
Inventors: 陈健
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2018-01-25
Filing date: 2018-01-25
Publication date: 2022-01-14
Anticipated expiration: 2038-01-25
Also published as: CN110119461A

Abstract

The embodiment of the application discloses a method and a device for processing query information, wherein the method comprises the following steps: acquiring query information input by a user, and identifying the semantics of the query information; converting the query information into key words according to the semantics, and determining a query mode corresponding to the query information; and processing the keywords and the image information of the current interface in the query mode to obtain a query result corresponding to the query information. The technical scheme provided by the application can improve the accuracy of the query result.

Description

Query information processing method and device

Technical Field

The present application relates to the field of internet technologies, and in particular, to a method and an apparatus for processing query information.

Background

With the continuous development of search engines, in addition to searching for text information input by a user, a search can be performed for pictures taken by the user or recorded voice. For example, in the Taobao network, the corresponding products can be searched by inputting keywords, or the same type of products contained in the pictures can be searched by uploading the pictures.

However, sometimes a query message provided by a user may contain multiple queryable objects, for example, the user provides a picture showing a star wearing a scarf, and then the current search engine may default to finding other pictures of the star in the picture when receiving the picture, but the user is likely to only want to query where the scarf in the picture can be purchased. Therefore, in practical applications, the results obtained by the existing search engine search may not be accurate.

Disclosure of Invention

The embodiment of the application aims to provide a method and a device for processing query information, which can improve the accuracy of a query result.

In order to achieve the above object, an embodiment of the present application provides a method for processing query information, where the method includes: acquiring query information input by a user, and identifying the semantics of the query information; converting the query information into key words according to the semantics, and determining a query mode corresponding to the query information; and processing the keywords and the image information of the current interface in the query mode to obtain a query result corresponding to the query information.

In order to achieve the above object, an embodiment of the present application further provides an apparatus for processing query information, where the apparatus includes a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the apparatus implements the following steps: acquiring query information input by a user, and identifying the semantics of the query information; converting the query information into key words according to the semantics, and determining a query mode corresponding to the query information; and processing the keywords and the image information of the current interface in the query mode to obtain a query result corresponding to the query information.

Therefore, according to the technical scheme provided by the application, after the user provides the query information, the semantics of the query information can be identified, and the semantics can indicate the object which the user actually wants to query at present. Therefore, corresponding keywords can be extracted from the query information provided by the user based on the semantics, and the keywords can be subsequently used as the basis of query. Meanwhile, different query modes can be adopted according to different semantics. For example, when the semantic meaning indicates that the user wants to query the person, a query method of face recognition can be adopted. For another example, when the semantic meaning indicates that the user wants to search for the commodity, a query mode of commodity search may be used. After the keywords and the query mode are determined, the keywords and the image information of the current interface can be processed through the determined query mode, and therefore a query result which a user wants to query can be obtained. Therefore, the method and the device combine the picture watched by the user with the semantics of the user to perform query, so that the accuracy of the query result can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a method for processing query information according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of image information acquisition of a current interface in an embodiment of the present application;

FIG. 3 is a schematic diagram of face recognition in an embodiment of the present application;

FIG. 4 is a schematic illustration of a hair styling embodiment of the present application;

fig. 5 is a schematic diagram of a processing device for querying information in the embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application shall fall within the scope of protection of the present application.

The application provides a query information processing method which can be applied to system architectures of a client and a server. The client may be a terminal device used by a user. For example, the client may be an electronic device such as a smartphone, a desktop computer, a laptop computer, a tablet computer, a smart television, and a smart wearable device. Of course, the client may also be software running in the electronic device. For example, the software may be an Archie technology APP (application), a Bellidine APP, a Tencent video APP, etc. The server may be a background business server for providing query services. A wide variety of resources may be stored in the server. The resource may be, for example, textual information, links, pictures, video, voice, etc. The client can interact with the user so as to receive the query information of the user and display the image information to the user, and the server can receive the query instruction sent by the client, query the object which the user wants to query and feed back the corresponding query result to the client. Referring to fig. 1, the query information processing method provided by the present application may include the following steps.

S1: acquiring query information input by a user, and identifying the semantics of the query information.

In this embodiment, the query information input by the user may have various forms. Specifically, the query information may include text information, voice information, or gesture information. The text information may be input to a client currently used by a user through a text input device such as a keyboard, a touch screen, or a handwriting pad, the voice information may be input through a microphone of the client, the gesture information may be captured through a camera of the client, and the gesture information may be sign language information, for example.

In this embodiment, after the client acquires the query information input by the user, if the query information is the voice information and/or the gesture information, the voice information/gesture information may be converted into the text information by using the technologies of voice recognition and image recognition. To determine the true intent of the query information, the semantics of the query information may be analyzed. Specifically, semantic analysis may be performed on the input text information or the converted text information. When performing semantic analysis, word segmentation may be performed on the text information first, so as to obtain words contained in the text information. In practical application, the word segmentation can be carried out in various existing ways. For example, word segmentation may be performed in a string matching based manner. When the word segmentation is carried out in a character string matching-based mode, word banks can be searched one by one for word segmentation according to different scanning modes. The scanning mode may include, for example, a forward maximum matching mode, a reverse maximum matching mode, a bidirectional maximum matching mode, a minimum slicing mode, and the like. For another example, the word segmentation may be performed by a full segmentation method. Specifically, all possible vocabularies matching the word stock can be segmented from the text information, and then the optimal segmentation result can be determined through the statistical language model. For another example, words may be segmented by constructing words from words. Specifically, labels of each word in the text information can be predicted, and the labels can be specifically divided into "beginning (beginning)", "middle (inside)", "ending (ending)", and "independent (single)", so that vocabularies at different positions can be divided. In predicting tags, predictions can be made by existing natural language processing models. The natural language processing Model may be, for example, HMM (Hidden Markov Model), MAXENT (Maximum entropy Model), CRF (Conditional Random Field) Model, or the like.

In this embodiment, after word segmentation, the divided words may be analyzed by using a topic model, so as to obtain the semantics corresponding to the text information. Specifically, the topic model may be, for example, an LSA (Latent Semantic Analysis) model, a PLSA (probabilistic Latent Semantic Analysis) model, an NMF (Non-negative Matrix Factorization) model, an LDA (Latent Dirichlet Allocation) model, or the like.

In addition, in the present embodiment, after word segmentation, word vectors of each vocabulary may be analyzed to determine semantics corresponding to the text information. In particular, the word vector can be analyzed by adopting a deep learning method. The deep learning method may be implemented by using a Convolutional Neural Network (CNN) or a Support Vector Machine (SVM), for example. After the word vectors of all the words in the character information are analyzed, the character information can be classified, and the semantics corresponding to the character information are determined.

In this embodiment, the user may input the query information during the process of watching the video or shooting the video. For example, a user watches a movie through a client, an actor appears in the movie, and the user can ask "who the actor is" by voice at this time. The information that the user inquires through voice can be input into the client as query information. For another example, when a user takes a self-timer by a client, the user may express "what do i get a good look" in sign language, and then the sign language information may be input as query information into the client.

In one embodiment, in order to avoid the client using all the voices or gestures of the user as query information, a trigger condition may be set for inputting the query information. The trigger condition may be, for example, a set wake-up message. Therefore, the client can detect whether the user inputs the awakening information or not, and only acquires the query information input by the user when the user inputs the awakening information. In practical applications, the wake-up information may include at least one of specified voice information, specified text information, or specified gesture information. For example, the wake-up message may be, for example, a wake-up vocabulary, which may be, for example, "query". Then the user can further input query information to the client by entering the text information "query" or by speaking the wake up vocabulary "query" or by expressing "query" by gesture while watching the video.

S3: and converting the query information into key words according to the semantics, and determining a query mode corresponding to the query information.

In this embodiment, after determining the semantics of the query information, the interference information in the query information may be removed, so as to obtain the keyword. For example, the query message is "where the clothing worn by the actor was purchased? "by analyzing the semantic meaning of the query information, it can be determined that what the user wants to realize at present is to query the commodity instead of querying the actor, and then the query information can be converted into keywords such as" commodity "or" clothes ". In particular, when converting a keyword, a target object of the semantic representation may first be determined. In practical applications, the target object of the semantic representation may generally include people, articles, places, audios and videos, and the like. After the target object with the semantic representation is determined, the target vocabulary for representing the target object can be extracted from the query information and used as the converted keyword. For example, if the query information is "who this actor is", the recognized semantic is a character, so that the target word "actor" representing the character can be extracted from the query information, and the "actor" is used as the converted keyword.

In this embodiment, the query modes adopted by the client are often different for different target objects. For example, for a person, the client may perform a query using a face recognition function; aiming at the articles, the client can adopt an article matching function or a commodity purchasing function to query; aiming at the location, the client can inquire through a positioning function or a navigation function; for videos, the client may query through a video search function. In view of this, in the present embodiment, a query style corresponding to the query may be determined according to the identified semantics. Specifically, an execution action of the semantic representation may be determined, and a query manner corresponding to the execution action may be used as a query manner corresponding to the query information. The performing action may include, for example, actions such as "buy," "search," "identify," and so on. The semantics corresponding to the query information may generally characterize the target object and the execution action for the target object. For example, the target object is "actor", and the action performed for "actor" may be "search". In practical applications, when determining the query mode corresponding to the execution action, multiple intention categories for representing the intention of the user may be predetermined. These intent categories may include, for example, at least one of person identification, item identification, merchandise purchase, location search, video search, appearance design. The design may be designed for the appearance of a hairstyle, an accessory, a cosmetic, or the like. The intention categories may be statistically derived for a large number of users. In practice, more or fewer intent categories may be employed, and the application is not limited thereto.

In this embodiment, after determining the plurality of intent categories, at least one query style associated with the intent category may be determined from the plurality of candidate query styles. For example, the intention category of person recognition may be associated with a query method of face recognition and/or person search, a query method of image search for commodity purchase may be associated, and a query method of positioning or navigation may be associated for location search. Specifically, the association relationship between the intention category and the query method may be as shown in table 1.

TABLE 1 Association of intent categories and query types

Intention category	Query mode
		Character recognition	Face recognition, person search
Article identification, merchandise purchase	Searching for objects with pictures
		Location search	Positioning and navigation
Video search	Program or movie search

In this embodiment, after associating the corresponding query methods for the respective intent categories, it is possible to determine the target intent category to which the semantic content belongs, and use the target query method associated with the target intent category as the query method corresponding to the execution action. For example, if the target intention category to which the semantic belongs is human recognition, then the query mode of "human face recognition, human search" may be used as the query mode corresponding to the execution action of the semantic representation.

In this embodiment, the client may be associated with a plurality of modules or systems in advance, and these modules and systems may be called by the client to implement a query mode corresponding to the query information. In particular, these modules or systems may be program components that are integrated within the client or may be application programs that are separate from the client. For example, the client may be associated with face recognition software, a pan APP, a Baidu map APP, and an Tencent video APP, and call corresponding software to perform an inquiry according to a currently required inquiry mode.

S5: and processing the keywords and the image information of the current interface in the query mode to obtain a query result corresponding to the query information.

In the present embodiment, after determining the keywords, if the result query is performed based on only the keywords, a considerable query range is obtained. For example, if the keyword is "actor", the result obtained by querying only based on "actor" is quite large and cannot meet the requirement of the user. At this time, the content shown in the image information of the current interface of the client when the user inputs the query information may be combined to jointly define the object that the user actually wants to query. For example, a user is saying "query: who this actor is, what is shown in the client's current picture is a face of "zhang san". At this time, the client may intercept the area image information containing the face of "zhang san" from the image information of the current interface, and use the area image information as one of the bases for querying.

In this embodiment, the keyword and the area image information of the current interface that the user is viewing may be processed in a certain query manner. Specifically, the query manner may include face recognition, object searching with a picture, positioning navigation, film search, hair style matching, configuration matching, makeup matching, and the like. The query mode can be realized by respective program modules, and the program modules can take the keywords and the image information of the current interface as input query conditions, so that corresponding search can be carried out, and a query result corresponding to the query information input by the user can be obtained.

In this embodiment, when the client plays or records a video, the client may obtain image information of the interface according to a specified time period. For example, the client may screen-cut once per second and store the image information resulting from the screen-cut. And the graphic information obtained by screen capture can be associated with the screen capture time. When the user inputs query information, the client may determine image information closest to a time node at which the query information is input from among the stored image information, and use the closest image information as the image information of the current interface. Referring to fig. 2, the client acquires the image information of the interface according to a specified time period, and when the query information is input by the user at the 3 rd second, the client may use the image information of the 3 rd second as the image information of the current interface corresponding to the query information.

In practical applications, in order to save the storage space of the client, only the image information of the most recent screen capture may be saved, and the image information of the previous screen capture may be directly deleted. For example, the client may save only the last 5 pieces of image information so as not to overuse the limited storage space.

In another embodiment, the query information input by the user can be used as a trigger condition for acquiring the image information of the current interface. Specifically, when the query information is acquired, the client may perform a screen capture operation, so as to acquire image information of an interface, and use the acquired image information as the image information of the current interface. For example, when the client detects that the user inputs query information, the client may obtain image information of the client interface while receiving the query information, and use the image information as image information corresponding to the query information.

In one application scenario, the user says "how do my hairstyle design look good? At this time, after receiving the query information, the client recognizes that the semantics thereof is appearance design, and can extract the keyword "hair style" and determine that the query mode adopted is hair style matching. In addition, the client may use the current frame of the self-timer video as the image information of the current interface, and the image information may include the face image of the user. Then, the client may call a program module or application for matching hair style, and input the keyword "hair style" and the image information of the current interface into the program module or application, so that a plurality of hair styles adapted to the face shape of the user may be obtained, and these hair styles may be used as the query result.

In this embodiment, the expression form of the query result may be various. Specifically, the query result may be text information, gesture information, image information, voice information, and the like. When the query result is presented to the user, the query result can be directly displayed in the image information of the current interface. For example, referring to fig. 3, if the user inquires about information of an actor in the current interface, the client may display the inquired information of the actor beside the avatar of the actor in the current interface. For another example, if the user inquires about the purchase information of a red hat in the current interface, the client may display the purchase link of the inquired cap of the same style beside the red hat. Therefore, when the query result is text information, the text information can be displayed in the image information of the current interface. In addition, the text information can be converted into voice information by a text-to-voice technology, and then the voice information is played. On the contrary, when the query result is voice information, the voice information can be directly played or the text information obtained by conversion is displayed in the image information of the current interface after the voice information is converted into the text information.

Of course, in an actual application scenario, the client may also jump from the software currently playing the video to the software acquiring the query result, so that the corresponding query result may be viewed in another software. For example, when a user watches a movie by using an Tencent video application, the user is very interested in a piece of clothes worn by an actor in the movie, so that query information of 'where the piece of clothes is bought' can be input, the client can call a mobile phone Taobao to search for the same type of goods of the clothes, and after the query result is obtained, the client can jump to a page of the mobile phone Taobao to display the query result to the user.

In one embodiment, the image information of the current interface usually contains relatively rich information, and in order to remove useless information in the current interface and thereby improve the accuracy of the query result, after the image information of the current interface is acquired, the area image information of the target object containing the representation of the keyword can be intercepted from the image information of the current interface according to the keyword. For example, a user says "query: what hairstyle fits me? "in this case, the semantic meaning obtained by recognition is appearance design, and the keyword is" hair style ". In this case, it is possible to acquire picture information of the current frame of the self-timer video and to intercept picture information containing the avatar of the user from the picture information. As another example, a user, while watching a video, says: "where this clothing was bought? "at this time, the semantic meaning obtained by recognition is commodity purchase, and the keyword is" clothes ". In this case, it is possible to acquire picture information of a current frame in the video and to intercept picture information including clothes from the picture information.

In view of this, in the present embodiment, when the area image information is intercepted, a target object represented by the keyword may be determined, and the target object may be identified from the image information of the current interface. For example, if the keyword is "actor", the target object represented by the keyword is a human face; for another example, if the keyword is "clothes," then the target object that it characterizes is clothes. After the target object is determined, the target object can be identified from the picture information of the current interface by using a conventional image identification means. Then, the region image information containing the target object may be taken as the region image information to which the keyword points. The size of the area image information including the target object may be preset, and in order to avoid interference of other objects in the current interface with the target object, the size of the area image information may be set to just include the target object.

In an actual application scenario, when a target object is identified from image information of a current interface according to a keyword, more than one target object is likely to be identified. For example, when the user says "who this actor is", the image information of the current interface includes two actors, and at this time, the client cannot determine which actor the user wants to query. In view of this, when the number of the target objects identified from the image information of the current interface is at least two, the client may generate and present prompt information for characterizing that the number of the currently identified target objects is at least two. The prompt information may be, for example, "please select the current recognition result more". The presentation form of the prompt message can include various forms. For example, the prompt message may be at least one of a voice prompt message, a text prompt message, and a gesture prompt message. Specifically, the presentation form of the prompt message may be consistent with the form of the query message input by the user. After the user views the prompt message, the user can select one of the target objects according to the prompt message. Specifically, the user may click on the corresponding target object by hand, or select the corresponding target object through a vocabulary defined by orientation (e.g., left, right, top, bottom, etc.). In this way, the operation of the user selecting one of the target objects can be input as the definition information into the client. When the client receives the definition information input by the user aiming at the prompt information, the target object defined by the definition information can be determined from at least two identified target objects. Then, the client may use the region image information including the target object defined by the definition information as the region image information to which the keyword points.

In addition, when the user inputs the query information, if a plurality of target objects are found in the current screen, the user can input the query information directly and simultaneously input the limiting information for limiting the object to be identified in the plurality of objects contained in the current interface, and the limiting information can also be used as a part of the query information. For example, the query information entered by the user may be "who the left actor is", wherein "left" may be used as the defining information. For another example, the query information input by the user may be "where the piece of red clothing is bought", wherein "red" may be used as the definition information. In addition, the definition information may also be represented by a click operation of the user on the touch screen, for example, when the user watches the video, two actors appear in the video, and the user is interested in one of the actors, so that it can be said that: "query: who this actor is ", while the user can click on the head portrait position of the corresponding actor by hand. In this way, the voice information of the user and the position information clicked on the touch screen can be jointly used as query information, wherein the position information clicked on the touch screen can be used as limiting information. Therefore, the query information input by the user may include definition information for defining an object to be recognized among a plurality of objects included in the current interface. In this way, after receiving query information containing definition information, the client may identify an object represented by the keyword from the image information of the current interface, where the number of identified objects may be multiple, and then may determine a target object defined by the definition information from the identified objects, and use area image information containing the target object as the area image information to which the keyword points.

In one embodiment, the query information input by the user is designed for the appearance, so that when the client side obtains the query result of the appearance design, the query result can be spliced with the specified part of the user, and the user can browse the effect graph assembled with the appearance design result. Specifically, when the query result is a design image, the client may identify a target object adapted to the design image from the region image information, and add the design image to a specified position of the target object. The design image may be an appearance decoration such as a hairstyle, a lipstick, glasses, a necklace, etc., and then the target object adapted to the design image may be a human face, a neck, etc. For example, referring to fig. 4, the user says "query: what hairstyle is more suitable to me? After receiving the query information, the client can acquire various hair styles matched with the face of the user in the above manner, and then can splice the acquired hair styles with the face of the user in the area image information, so that an effect graph with different hair styles is provided for the user. As shown in fig. 4, in practical applications, since there may be a plurality of hairstyles queried, these hairstyles may be sequentially shown below the face of the user, and may be clicked and selected by the user. When the user selects one of the hairstyles, the selected hairstyle can be spliced with the face of the user.

Of course, in practical applications, the presentation manner of the query result may also include various forms. For example, when the query result is gesture information, the gesture information may be played in the current interface in the form of a floating window, and when the gesture information is played or the number of times of playing the gesture information reaches a specified number of times, the floating window is closed.

Referring to fig. 5, the present application further provides a device for processing query information, the device includes a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the method includes the following steps:

s1: acquiring query information input by a user, and identifying the semantics of the query information;

s3: converting the query information into key words according to the semantics, and determining a query mode corresponding to the query information;

In this embodiment, when the computer program is executed by the processor, the following steps are also implemented:

acquiring image information of an interface according to a specified time period, and taking the image information closest to a time node for inputting the query information as the image information of the current interface;

or

And when the query information is acquired, acquiring image information of an interface, and taking the acquired image information as the image information of the current interface.

determining a target object represented by the keyword, and identifying the target object from the image information of the current interface;

and taking the area image information containing the target object as the area image information pointed by the keyword.

if the number of the target objects identified from the image information of the current interface is at least two, generating and displaying prompt information for representing that the number of the currently identified target objects is at least two; the prompt message comprises at least one of voice prompt message, character prompt message and gesture prompt message;

receiving definition information input by a user aiming at the prompt information, and determining a target object defined by the definition information from at least two identified target objects;

and taking the area image information containing the target object defined by the definition information as the area image information pointed by the keyword.

when the query result is text information, displaying the text information in the image information of the current interface or playing the voice information after converting the text information into voice information;

and when the query result is voice information, playing the voice information or converting the voice information into text information, and displaying the text information obtained by conversion in the image information of the current interface.

and when the query result is an appearance design image, identifying a target object matched with the appearance design image from the image information of the current interface, and adding the appearance design image to the specified position of the target object.

In this embodiment, the memory may include a physical device for storing information, and typically, the information is digitized and then stored in a medium using an electrical, magnetic, or optical method. The memory according to this embodiment may further include: devices that store information using electrical energy, such as RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, usb disks; devices for storing information optically, such as CDs or DVDs. Of course, there are other ways of memory, such as quantum memory, graphene memory, and so forth.

In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth.

The specific functions of the device, the memory thereof, and the processor thereof provided in the embodiments of this specification can be explained in comparison with the foregoing embodiments in this specification, and can achieve the technical effects of the foregoing embodiments, and thus, will not be described herein again.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardbyscript Description Language (vhr Description Language), and the like, which are currently used by Hardware compiler-software (Hardware Description Language-software). It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

Those skilled in the art will also appreciate that, in addition to implementing clients and servers as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the clients and servers implement logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such clients and servers may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as structures within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the device, reference may be made to the introduction of embodiments of the method described above for comparison.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Although the present application has been described in terms of embodiments, those of ordinary skill in the art will recognize that there are numerous variations and permutations of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and permutations without departing from the spirit of the application.

Claims

1. A method for processing query information, the method comprising:

acquiring query information input by a user, and identifying the semantics of the query information;

converting the query information into key words according to the semantics, and determining a query mode corresponding to the query information;

processing the keywords and the image information of the current interface in the query mode to obtain a query result corresponding to the query information;

after determining the query mode corresponding to the query information, the method further comprises:

intercepting regional image information pointed by the keyword from image information of a current interface;

correspondingly, the keyword and the regional image information are processed in the query mode to obtain a query result corresponding to the query information;

the query information also comprises definition information, and the definition information is used for defining an object to be identified in a plurality of objects contained in the current interface;

correspondingly, intercepting the area image information pointed by the keyword from the image information of the current interface comprises the following steps:

identifying objects represented by the keywords from the image information of the current interface, and determining target objects defined by the definition information from the identified objects;

2. The method of claim 1, wherein the query information comprises voice information and/or gesture information; accordingly, after obtaining the query information input by the user, the method further comprises:

and converting the voice information and/or the gesture information into character information, and identifying the semantics of the character information.

3. The method of claim 1, wherein the image information of the current interface is obtained as follows:

or

4. The method of claim 1, wherein converting the query information into keywords comprises:

determining a target object of the semantic representation;

and extracting a target vocabulary for representing the target object from the query information, and taking the target vocabulary as a converted keyword.

5. The method of claim 1, wherein determining the query style corresponding to the query information comprises:

and determining the execution action of the semantic representation, and taking the query mode corresponding to the execution action as the query mode corresponding to the query information.

6. The method of claim 5, wherein the query mode corresponding to the execution action is determined as follows:

the method comprises the steps of determining multiple intention categories for representing user intentions in advance, and determining at least one query mode associated with the intention categories from multiple candidate query modes; wherein the intention category comprises at least one of person identification, article identification, commodity purchase, place search, video search and appearance design;

and judging the target intention category to which the semantics belong, and taking a target query mode associated with the target intention category as a query mode corresponding to the execution action.

7. The method of claim 1, wherein prior to obtaining the query information entered by the user, the method further comprises:

detecting whether a user inputs awakening information or not, and acquiring query information input by the user only when the awakening information is detected to be input by the user; the awakening information comprises at least one of designated voice information, designated text information or designated gesture information.

8. The method of claim 1, wherein the step of intercepting the image information of the area pointed by the keyword from the image information of the current interface comprises the steps of:

9. The method of claim 8, wherein if the number of target objects identified from the image information of the current interface is at least two, the method further comprises:

generating and displaying prompt information for representing that the number of the currently identified target objects is at least two; the prompt message comprises at least one of voice prompt message, character prompt message and gesture prompt message;

10. The method according to claim 1, wherein after obtaining the query result corresponding to the query information, the method further comprises:

11. The method according to claim 1, wherein after obtaining the query result corresponding to the query information, the method further comprises:

12. A processing apparatus for querying information, the apparatus comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, performs the steps of:

after determining the query mode corresponding to the query information, the following steps are also implemented:

13. The apparatus according to claim 12, wherein the computer program, when executed by the processor, further performs the steps of:

or

14. The apparatus according to claim 12, wherein the computer program, when executed by the processor, further performs the steps of:

15. The apparatus according to claim 14, wherein the computer program, when executed by the processor, further performs the steps of:

16. The apparatus according to claim 12, wherein the computer program, when executed by the processor, further performs the steps of:

17. The apparatus according to claim 12, wherein the computer program, when executed by the processor, further performs the steps of: