CN107609056B

CN107609056B - Question and answer processing method and device based on picture recognition

Info

Publication number: CN107609056B
Application number: CN201710743444.8A
Authority: CN
Inventors: 吴志全
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2017-08-25
Filing date: 2017-08-25
Publication date: 2021-08-03
Anticipated expiration: 2037-08-25
Also published as: CN107609056A

Abstract

The application aims to provide a question and answer processing method and device based on picture identification; extracting corresponding retrieval information from a target picture submitted by a question and answer requesting user, wherein the retrieval information comprises one or more text messages and relative position information of the text messages in the picture; matching and querying in a reference picture library based on the retrieval information to obtain a reference picture matched with the text information and the relative position information; and determining answer information of the question corresponding to the target picture based on the question and answer information corresponding to the reference picture. The problem request that the user provided, expressed with the picture form can be realized discerning and handling by this application to improve the service ability and the service coverage of intelligent customer service equipment or application, reduced intelligent processing and changed artifical processing proportion, greatly practiced thrift manpower resources, improved the whole efficiency that the user problem was handled.

Description

Question and answer processing method and device based on picture recognition

Technical Field

The application relates to the field of image processing, in particular to a question and answer processing technology based on picture identification.

Background

With the development of computer network technology, various internet applications, such as web applications or terminal applications, provide a question and answer platform or window for interacting with internet users, and internet users can submit questions encountered in the application using process through the question and answer platform or window in order to obtain answers to the questions; although the manual customer service can process various types of problems submitted by the user, such as text types or picture types, the application of the method requires high labor cost, the problem processing efficiency is low, and the problem processing level is easily inconsistent as a whole due to the influence of operation experiences of different manual customer services. For internet applications with a large number of users, the frequency of questioning of users is generally high, and the types of problems are many, and at this time, no matter the existing intelligent customer service system or manual customer service system can not give consideration to various beneficial effects of facilitating user operation, improving problem processing efficiency, reducing resource cost and the like.

Disclosure of Invention

An object of the present application is to provide a question and answer processing method and device based on picture recognition.

According to one aspect of the application, a question and answer processing method based on picture recognition is provided, and comprises the following steps:

extracting corresponding retrieval information from a target picture submitted by a question and answer requesting user, wherein the retrieval information comprises one or more text messages and relative position information of the text messages in the picture;

matching and querying in a reference picture library based on the retrieval information to obtain a reference picture matched with the text information and the relative position information;

and determining answer information of the question corresponding to the target picture based on the question and answer information corresponding to the reference picture.

According to another aspect of the present application, there is also provided a question and answer processing device based on picture recognition, including:

the system comprises a first device, a second device and a third device, wherein the first device is used for extracting corresponding retrieval information from a target picture submitted by a question and answer requesting user, and the retrieval information comprises one or more text messages and relative position information of the text messages in the picture;

second means for matching a query in a reference picture library based on the retrieval information to obtain a reference picture matching the text information and the relative position information;

and the third device is used for determining answer information of the question corresponding to the target picture based on the question and answer information corresponding to the reference picture.

According to another aspect of the application, there is also provided a computer readable storage medium having stored thereon computer code which, when executed, performs the method as described above.

According to yet another aspect of the present application, there is also provided a computer program product, which, when executed by a computer device, performs the method as described above.

According to yet another aspect of the present application, there is also provided a computer apparatus including:

one or more processors;

a memory for storing one or more computer programs;

the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the methods as described above.

Compared with the prior art, the method and the device have the advantages that the corresponding retrieval information is extracted from the target picture submitted by the question and answer requesting user, the reference picture matched with the text information and the relative position information is obtained by matching and inquiring in the reference picture library based on the retrieval information, and finally the answer information of the question corresponding to the target picture is determined based on the question and answer information corresponding to the reference picture. The question and answer processing device, such as the intelligent customer service device or the application, can identify and process the question request which is provided by the user and expressed in the form of the picture, so that the service capacity and the service coverage range of the intelligent customer service device or the application are improved, the ratio of intelligent processing to manual processing is reduced, the human resources are greatly saved, and the overall efficiency of the question processing of the user is improved; meanwhile, the method and the device ensure higher consistency of the processing level of the processing problem; for the user, the method and the device for processing the questions can facilitate the questioning operation of the user, and the satisfaction degree of the user on the question processing process and results is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 illustrates a flow chart of a method for question and answer processing based on picture recognition in accordance with an aspect of the subject application;

FIG. 2 illustrates an apparatus diagram of a question-answering processing apparatus based on picture recognition according to another aspect of the present application;

FIG. 3 illustrates an exemplary diagram of retrieving information extraction and matching queries based on a target picture according to one embodiment of the present application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

The term "computer device" or "computer" in this context refers to an intelligent electronic device that can execute predetermined processes such as numerical calculation and/or logic calculation by running predetermined programs or instructions, and may include a processor and a memory, wherein the processor executes a pre-stored instruction stored in the memory to execute the predetermined processes, or the predetermined processes are executed by hardware such as ASIC, FPGA, DSP, or a combination thereof. Computer devices include, but are not limited to, servers, personal computers, laptops, tablets, smart phones, and the like.

The computer equipment comprises user equipment and network equipment. Wherein the user equipment includes but is not limited to computers, smart phones, PDAs, etc.; the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of computers or network servers, wherein Cloud Computing is one of distributed Computing, a super virtual computer consisting of a collection of loosely coupled computers. Wherein the computer device can be operated alone to implement the invention, or can be accessed to a network and implement the invention through interoperation with other computer devices in the network. The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.

It should be noted that the user equipment, the network device, the network, etc. are only examples, and other existing or future computer devices or networks may also be included in the scope of the present invention, and are included by reference.

The methods discussed below, some of which are illustrated by flow diagrams, may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine or computer readable medium such as a storage medium. The processor(s) may perform the necessary tasks.

Specific structural and functional details disclosed herein are merely representative and are provided for purposes of describing example embodiments of the present invention. The present invention may, however, be embodied in many alternate forms and should not be construed as limited to only the embodiments set forth herein.

It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element may be termed a second element, and, similarly, a second element may be termed a first element, without departing from the scope of example embodiments. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being "directly connected" or "directly coupled" to another element, there are no intervening elements present. Other words used to describe the relationship between elements (e.g., "between" versus "directly between", "adjacent" versus "directly adjacent to", etc.) should be interpreted in a similar manner.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

The device referred to in this application includes, but is not limited to, a user device, a network device, or a device formed by integrating a user device and a network device through a network. The user equipment includes, but is not limited to, any mobile electronic product, such as a smart phone, a tablet computer, etc., capable of performing human-computer interaction with a user (e.g., human-computer interaction through a touch panel), and the mobile electronic product may employ any operating system, such as an android operating system, an iOS operating system, etc. The network device includes an electronic device capable of automatically performing numerical calculation and information processing according to a preset or stored instruction, and hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like. The network device includes but is not limited to a computer, a network host, a single network server, a plurality of network server sets or a cloud of a plurality of servers; here, the Cloud is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing), which is a kind of distributed Computing, one virtual supercomputer consisting of a collection of loosely coupled computers. Including, but not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless Ad Hoc network (Ad Hoc network), etc. Preferably, the device may also be a program running on the user device, the network device, or a device formed by integrating the user device and the network device, the touch terminal, or the network device and the touch terminal through a network.

The embodiment of the application provides a question and answer processing method based on picture identification, and the method can be implemented in corresponding question and answer processing equipment. The question-answer processing device may include, but is not limited to, the network device described above. The question and answer processing apparatus may further include a program application for processing user question and answer. In one implementation, the question-answer processing method is suitable for a scenario of providing a user with an answer service for a question request, and the question-answer processing device may include an intelligent customer service device or an application.

Fig. 1 shows a flowchart of a question-answering processing method based on picture recognition according to an aspect of the present invention. Wherein the method comprises step S11, step S12 and step S13.

In step S11, corresponding search information is extracted from the target picture submitted by the question and answer requesting user, where the search information includes one or more text messages and relative position information of the text messages in the picture.

In one implementation, a target picture submitted by a question and answer requesting user is used for responding to and expressing a question request of the user, the question request may include various questions encountered by the question and answer requesting user in the process of using various internet applications, such as a web application or a terminal application, the target picture may be from a screenshot of the question and answer requesting user on an application interface, for example, when the application interface presents prompt information, the user may store an interface containing the prompt information in a picture form and upload the interface as the target picture. Here, the user does not need to directly input a question request that the user wants to express in a text form, but can directly upload a target picture including the question request or corresponding to the question request, and then the question processing device automatically processes the target picture to determine the question request of the user and determine an answer or solution corresponding to the question request, thereby simplifying the user operation.

In one implementation, the target picture submitted by the question and answer requesting user may be subjected to character recognition to analyze corresponding retrieval information, for example, the target picture is subjected to text recognition by using an OCR character recognition method. In one implementation manner, one or more text messages may be extracted from top to bottom and from left to right on a picture, wherein the relative position information is obtained by sequentially marking the position sequences from top to bottom and from left to right, and each piece of relative position information corresponds to the text message belonging to the position. In one implementation, the retrieval information may be represented as a target picture, one or more pieces of relative position information corresponding to the target picture, and text information corresponding to each piece of relative position information. Here, fig. 3 shows an exemplary diagram of retrieval information extraction and matching query based on a target picture according to an embodiment of the present application. The character recognition is carried out on the target picture to obtain relative position information: position 1, position 2, position 3, position 4, and text information corresponding to the relative position information, i.e., text 1, text 2, text 3, text 4.

Further, in an implementation manner, an operation of removing interference information may be performed on all text information extracted from the target picture, that is, text information with a low degree of association with a problem corresponding to the target picture, for example, some high-frequency interference items, may be deleted. One practical example is: when a user applies for limit operation in the application of a user equipment end, when encountering information prompt, a plurality of text messages are sequentially analyzed based on a target picture uploaded by the user:

china Unicom; 11:49 in the morning; applying for a quota; identity authentication; confirming the information; please input Baidu wallet payment password to verify the identity; forget the password; prompting; the password input is wrong, and can be input for 4 times; i know.

In this case, the existing interference item information base may be referred to, and high-frequency text information with low association degree with the problem corresponding to the target picture, such as "china unicom", "11: 49 in the morning", may be discarded. And further, determining retrieval information based on the one or more text messages with the interference information removed and the corresponding relative position information.

Then, in step S12, a query is matched in a reference picture library based on the retrieval information to obtain a reference picture matching the text information and the relative position information.

The reference picture library includes a plurality of reference pictures therein. In one implementation, the reference picture may be matched with corresponding question-answer information, where the question-answer information includes one or more questions matched with the reference picture, and each question may be matched with one or more answers. Here, the problem may be directly obtained from the content of the reference picture, or may be determined by analyzing and integrating the content of the reference picture. In one implementation, the matching relationship between the reference picture and the question and answer information can be automatically established and updated through machine learning; in one implementation, the matching relationship between the reference picture and the question and answer information can be determined through manual operation; in one implementation, machine learning and manual operations may also be applied in combination to determine the matching relationship.

In one implementation, a reference picture matching the target picture in a picture library may be retrieved based on one or more text messages in the retrieval information and the relative position information of the text messages in combination with inverted index information corresponding to the reference picture library. Here, the inverted index information includes one or more texts, one or more pictures corresponding to the texts, and relative position information of the texts in the corresponding pictures. Here, the candidate reference picture matching the text information and the relative position information in the search information may be selected as the reference picture according to the inverted index information.

In one embodiment, step S12 includes step S121 (not shown), step S122 (not shown), and step S123 (not shown).

Specifically, in step S121, a query is matched in a reference picture library based on the text information to obtain one or more candidate reference pictures, each candidate reference picture being matched with at least one of the one or more text information.

Here, the candidate reference picture is a picture that matches at least one of the one or more text messages, and is selected from a plurality of pictures in a reference picture library. The matching of the candidate reference picture to at least one of the one or more textual information may include the candidate reference picture having at least one reference text matching the textual information. In one implementation, the candidate reference picture may include one or more reference texts matching the text information, and may further include one or more other texts not matching the text information. Preferably, the matching includes the textual information being consistent. In one implementation, each text information in the search information is sequentially searched in the reference picture library, for example, by searching through the inverted index information, a reference text matching the text information can be searched, and a picture corresponding to the reference text is determined as a candidate reference picture. Here, referring to fig. 3, in an implementation, a matching query is performed based on text information extracted from a target picture, such as text 1, text 2, text 3, and text 4.. in the figure, a reference text matching the text information, that is, text 1, text 2, text 3, and text 4.. is queried, and candidate reference pictures, that is, picture 1, picture 2, and picture 3.. are determined based on picture information corresponding to the reference text. The candidate reference picture 1 is matched with the text information 1, the candidate reference picture 2 is matched with the text information 1, the text information 2 and the text information 3, and the candidate reference picture 3 is matched with the text information 4.

Next, in step S122, according to the text information, the relative position information, and the relative position of the reference text in the candidate reference picture, the matching degree information between the candidate reference picture and the target picture is determined, where the reference text matches with the text information.

Here, if only one of the determined candidate reference pictures is available, it may be determined that the candidate reference picture is the reference picture. If there are a plurality of candidate reference pictures, the reference picture with high matching degree information can be determined by calculating the matching degree information of each candidate reference picture and the target picture.

One embodiment for determining the matching degree information of the candidate reference picture and the target picture is as follows:

if the target picture corresponds to N text messages at N relative positions, the candidate reference picture also corresponds to N texts at N relative positions, the texts may include a plurality of reference texts matched with the text messages, and may also include a plurality of other texts unmatched with the text messages, and then text similarity between the text message corresponding to each relative position information in the target picture and the reference text or other text corresponding to the corresponding relative position information in the candidate reference picture is calculated; and then, adding the text similarity corresponding to all the relative position information of the target picture, and dividing the sum by the number of the text information to obtain the matching degree information of the candidate reference picture and the target picture.

TABLE a

For example, referring to table a, when N takes 3, position 1 in the target picture — text information 1, position 2 — text information 2, position 3 — text information 3; and calculating the following steps that if the position 1 is the reference text 1, the position 2 is the reference text 2, and the position 3 is the other text 1 in the corresponding candidate reference picture 1: (the text similarity of the text information 1 and the reference text 1 + the text similarity of the text information 2 and the reference text 2 + the text similarity of the text information 3 and other texts 2)/the number of the text information, namely the matching degree information of the candidate reference picture and the target picture is obtained.

Further, in order to avoid the misalignment of the matched text information and reference information and the respective relative position information thereof and ensure the matching accuracy, the matching degree information of a plurality of groups of candidate reference pictures and the target picture can be calculated by referring to the respective relative position information of the candidate reference pictures and adjusting the text matching sequence of the target picture and the candidate reference pictures, and the maximum value is selected from the matching degree information to serve as the matching degree information.

With reference to table a, the matching degree information between the candidate reference pictures and the target picture can be calculated as follows:

(text similarity of text message 1 and reference text 1 + text similarity of text message 2 and reference text 2 + text similarity of text message 3 and other text 2)/number of text messages;

(text similarity of text information 1 and reference text 2 + text similarity of text information 2 and reference text 3 + 0)/number of text messages;

(text similarity +0+0 of text message 1 and reference text 3)/number of text messages;

(0+ text similarity of text message 2 and reference text 1 + text similarity of text message 3 and reference text 2)/number of text messages;

(0+0+ text similarity of text information 3 and reference text 1)/number of text information.

And then, selecting the maximum value from the matching degree information of the candidate reference pictures and the target picture as the matching degree information. In the present application, N is an arbitrary positive integer, and the above calculation method is applied to the estimation when N is an arbitrary positive integer.

In one implementation, the text similarity may be calculated using a euclidean distance algorithm, and those skilled in the art will appreciate that the text similarity algorithm is merely exemplary, and other existing or future text similarity algorithms may be included herein by reference, as applicable.

Yet another embodiment for determining the matching degree information of the candidate reference picture and the target picture is as follows: in step 122, according to whether the relative position information of the text information is consistent with the relative position of the reference text in the candidate reference picture, the matching degree information of the candidate reference picture and the target picture is determined. Here, when the reference text of the candidate reference picture matches the one or more text messages, it is determined whether the reference text matches the relative position information of the matching text messages in the respective pictures. For example, in table a, text information 1 is at position 1 of a target picture, text information 2 is at position 2 of the target picture, and if the reference information 1 matches the text information 1 and reference information 2 matches the text information 2, since the reference information 1 and reference information 2 are at position 1 and position 2 of the reference picture, respectively, that is, the relative position information of the text information and the relative position information of the reference text in the candidate reference picture are consistent, it may be determined that the matching degree information of the candidate reference picture and the target picture is high. In one implementation, the number of times that the relative positions are consistent may be accumulated to determine corresponding matching degree information, and the greater the number of times that is consistent, the greater the matching degree information.

Yet another embodiment for determining the matching degree information of the candidate reference picture and the target picture is as follows: in step 122, if the candidate reference picture includes a plurality of reference texts, determining matching degree information between the candidate reference picture and the target picture according to whether text sequence information corresponding to the plurality of text information matches text sequence information corresponding to the plurality of reference texts, where the text sequence information corresponding to the plurality of text information and the text sequence information corresponding to the plurality of reference texts are determined by relative positions corresponding to the texts respectively.

An example is: if the candidate reference picture includes a reference text 1, a reference text 2, and a reference text 3, which respectively correspond to a position 2, a position 3, and a position 4 in the candidate reference picture, the reference text 1, the reference text 2, and the reference text 3 respectively correspond to the same text information 1, text information 2, and text information 3 of the target picture, which respectively correspond to the position 1, the position 2, and the position 3 of the target picture. At this time, based on the position sequence of the position 2, the position 3, and the position 4 in the reference picture, it can be determined that the text sequence information of the reference text 1, the reference text 2, and the reference text 3 is the reference text 1 to the reference text 2 to the reference text 3; similarly, based on the position sequence of the position 1, the position 2, and the position 3 in the target picture, it can be determined that the text sequence of the text information 1, the text information 2, and the text information 3 is the text information 1 to the text information 2 to the text information 3. Thus, the comparison may determine that the text order information corresponding to the plurality of text information matches the text order information corresponding to the plurality of reference texts, preferably the matching includes coincidence. Further, the matching degree information of the candidate reference picture and the target picture is determined, for example, it may be determined that the complete coincidence is 1 and the complete non-coincidence is 0. For another example, different matching degree information may be determined according to the number of texts corresponding to the same order. If there are N text sequences corresponding to the reference texts, and the text sequences are consistent with the new text sequences of the corresponding N texts, determining a corresponding matching degree information, and similarly, for a text sequence corresponding to N +1 reference texts, and a corresponding text sequence corresponding to N +1 text sequences, determining another corresponding matching degree information, and setting the matching degree information of the latter to be larger.

Yet another embodiment for determining the matching degree information of the candidate reference picture and the target picture is as follows: if the candidate reference pictures are multiple, determining the matching degree information based on the matching times of the reference texts in the candidate reference pictures and the text information of the target picture, wherein the matching times are more, and the matching degree information is larger. And then, determining the candidate reference picture with the largest matching times as the reference picture.

Further, another embodiment for determining the matching degree information of the candidate reference picture and the target picture is as follows: if at least two pieces of candidate reference picture information with the maximum matching degree information are determined based on the matching times, further determining the matching degree information of the candidate reference picture and the target picture according to whether the relative position information of the text information is consistent with the relative position of the reference text in the candidate reference picture; or if the candidate reference picture comprises a plurality of reference texts, determining matching degree information of the candidate reference picture and the target picture according to whether text sequence information corresponding to the text information is matched with text sequence information corresponding to the reference texts, wherein the text sequence information corresponding to the text information and the text sequence information corresponding to the reference texts are respectively determined by the relative position corresponding to each text.

Next, in step S123, a reference picture matching the text information and the relative position information is determined in the one or more candidate reference pictures according to the matching degree information. Here, the greater the matching degree information is, the higher the degree of similarity between the corresponding candidate reference picture and the target picture is, and therefore, when it is determined as the reference picture, the higher the accuracy of matching between the question and answer information corresponding thereto and the question and answer request user wishes to express through the target picture is.

According to the method and the device, corresponding retrieval information is extracted from a target picture submitted by a question and answer requesting user, then matching query is carried out in a reference picture library based on the retrieval information so as to obtain a reference picture matched with the text information and the relative position information, and finally answer information of a question corresponding to the target picture is determined based on the question and answer information corresponding to the reference picture. The question and answer processing device, such as the intelligent customer service device or the application, can identify and process the question request which is provided by the user and expressed in the form of the picture, so that the service capacity and the service coverage range of the intelligent customer service device or the application are improved, the ratio of intelligent processing to manual processing is reduced, the human resources are greatly saved, and the overall efficiency of the question processing of the user is improved; meanwhile, the method and the device ensure higher consistency of the processing level of the processing problem; for the user, the method and the device for processing the questions can facilitate the questioning operation of the user, and the satisfaction degree of the user on the question processing process and results is improved.

In an embodiment of the present application, in step S13, if the reference picture is retrieved, determining answer information of the question corresponding to the target picture based on the question-answer information corresponding to the reference picture; otherwise, sending a question description request to the question and answer requesting user, wherein the question description request is used for indicating the user to determine the question request. In practical applications, the reference picture may not be retrieved, for example, the reference text of the corresponding reference picture may not be matched and queried based on the text information extracted from the target picture. At this time, a question description request may be sent to the question and answer requesting user to guide the question and answer requesting user to describe or determine the question request, for example, sending "what do ask your question? And the text information and the like can be subsequently served by a manual customer service or an intelligent customer service to interact with the question-answer request user.

In an embodiment of the present application, in step S13, if it is determined that the target picture corresponds to at least two questions based on the question and answer information corresponding to the reference picture, the at least two questions are provided to the question and answer requesting user; then, the question and answer request user is obtained from at least two questions; answer information for the question selected by the question-answering requesting user is then determined. For example, through the matching query operation, if one reference picture is determined, there may be a case where the reference picture corresponds to more than two problems; for another example, if more than two reference pictures are queried in a matching manner, if multiple candidate reference pictures with consistent similarity information exist, the candidate reference pictures are determined to be the reference pictures, and the total number of questions corresponding to the reference pictures is more than two, at this time, the at least two questions may be provided to the question and answer requesting user, the question and answer requesting user selects a question that the question and answer requesting user actually desires to request, and notifies the question and answer processing device 1 of the selected question, and then the question and answer processing device 1 determines answer information of the question selected by the question and answer requesting user.

In one embodiment of the present application, the method further includes step S14 (not shown), and in step S14, the answer information is fed back to the question-answer requesting user. Further, the method may further include step S15 (not shown) and step S16 (not shown). In step S15, feedback information submitted by the question and answer requesting user based on the answer information may be acquired. Next, in step S16, if the feedback information includes that the problem corresponding to the target picture is solved, the target picture is updated into the reference picture library. Here, the reference picture library may update the inverted index information based on one or more text information of the updated target picture, and relative position information of the text information in the picture. Here, the corresponding relationship between the target picture and the question and answer information may also be established and stored. Or if the feedback information includes that the problem corresponding to the target picture is not solved, marking the target picture as a problem to be solved, and performing matching query again on the target picture marked as the problem to be solved after a certain period can be set; and the target picture marked as the problem to be solved can be converted into manual customer service treatment.

Fig. 2 is a schematic diagram of a question-answering processing device based on picture recognition according to another aspect of the present application. Wherein the question-answering processing apparatus 1 comprises a first device 21, a second device 22 and a third device 23.

The first device 21 may extract corresponding search information from a target picture submitted by a question and answer requesting user, where the search information includes one or more text messages and relative position information of the text messages in the picture.

Here, the second device 22 matches a query in a reference picture library based on the retrieval information to obtain a reference picture matching the text information and the relative position information.

In one embodiment, the second device 22 may include a first unit 221 (not shown), a second unit S222 (not shown), and a third unit S223 (not shown).

In particular, the first unit 221 may match the query in the reference picture library based on the text information to obtain one or more candidate reference pictures, each candidate reference picture matching at least one of the one or more text information.

Here, the second unit 222 may determine matching degree information of the candidate reference picture and the target picture according to the text information and the relative position information, and a relative position of a reference text in the candidate reference picture, where the reference text matches the text information.

TABLE a

Yet another embodiment for determining the matching degree information of the candidate reference picture and the target picture is as follows: the second unit 222 may determine matching degree information between the candidate reference picture and the target picture according to whether the relative position information of the text information is consistent with the relative position of the reference text in the candidate reference picture. Here, when the reference text of the candidate reference picture matches the one or more text messages, it is determined whether the reference text matches the relative position information of the matching text messages in the respective pictures. For example, in table a, text information 1 is at position 1 of a target picture, text information 2 is at position 2 of the target picture, and if the reference information 1 matches the text information 1 and reference information 2 matches the text information 2, since the reference information 1 and reference information 2 are at position 1 and position 2 of the reference picture, respectively, that is, the relative position information of the text information and the relative position information of the reference text in the candidate reference picture are consistent, it may be determined that the matching degree information of the candidate reference picture and the target picture is high. In one implementation, the number of times that the relative positions are consistent may be accumulated to determine corresponding matching degree information, and the greater the number of times that is consistent, the greater the matching degree information.

Yet another embodiment for determining the matching degree information of the candidate reference picture and the target picture is as follows: if the candidate reference picture includes a plurality of reference texts, the second unit 222 may determine matching degree information between the candidate reference picture and the target picture according to whether text sequence information corresponding to the plurality of text information matches text sequence information corresponding to the plurality of reference texts, where the text sequence information corresponding to the plurality of text information and the text sequence information corresponding to the plurality of reference texts are determined by relative positions corresponding to the respective texts.

Here, the third unit 223 may determine a reference picture matching the text information and the relative position information among the one or more candidate reference pictures according to the matching degree information. Here, the greater the matching degree information is, the higher the degree of similarity between the corresponding candidate reference picture and the target picture is, and therefore, when it is determined as the reference picture, the higher the accuracy of matching between the question and answer information corresponding thereto and the question and answer request user wishes to express through the target picture is.

In an embodiment of the present application, if the reference picture is retrieved, the third device 23 may determine answer information of the question corresponding to the target picture based on the question-answer information corresponding to the reference picture; otherwise, sending a question description request to the question and answer requesting user, wherein the question description request is used for indicating the user to determine the question request. In practical applications, the reference picture may not be retrieved, for example, the reference text of the corresponding reference picture may not be matched and queried based on the text information extracted from the target picture. At this time, a question description request may be sent to the question and answer requesting user to guide the question and answer requesting user to describe or determine the question request, for example, sending "what do ask your question? And the text information and the like can be subsequently served by a manual customer service or an intelligent customer service to interact with the question-answer request user.

In an embodiment of the present application, if it is determined that the target picture corresponds to at least two questions based on the question and answer information corresponding to the reference picture, the third device 23 may provide the at least two questions to the question and answer requesting user; next, the third device 23 may acquire a question selected by the question-answer requesting user from at least two questions; next, the third device 23 may determine answer information of the question selected by the question-answering requesting user. For example, through the matching query operation, if one reference picture is determined, there may be a case where the reference picture corresponds to more than two problems; for another example, if more than two reference pictures are queried in a matching manner, if multiple candidate reference pictures with consistent similarity information exist, the candidate reference pictures are determined to be the reference pictures, and the total number of questions corresponding to the reference pictures is more than two, at this time, the at least two questions may be provided to the question and answer requesting user, the question and answer requesting user selects a question that the question and answer requesting user actually desires to request, and notifies the question and answer processing device 1 of the selected question, and then the question and answer processing device 1 determines answer information of the question selected by the question and answer requesting user.

In one embodiment of the present application, the question-answering processing apparatus 1 further includes a fourth device (not shown) that can feed back the answer information to the question-answering requesting user. Further, the question-answering processing apparatus 1 may further include fifth means (not shown) and sixth means (not shown). The fifth means may acquire feedback information that the question-and-answer requesting user submits based on the answer information. Then, if the feedback information includes that the problem corresponding to the target picture is solved, the sixth apparatus may update the target picture into a reference picture library. Here, the reference picture library may update the inverted index information based on one or more text information of the updated target picture, and relative position information of the text information in the picture. Here, the corresponding relationship between the target picture and the question and answer information may also be established and stored. Or, if the feedback information includes that the problem corresponding to the target picture is not solved, the sixth apparatus may mark the target picture as a problem to be solved, where, after a certain period may be set, the target picture marked as the problem to be solved is subjected to matching query again; and the target picture marked as the problem to be solved can be converted into manual customer service treatment.

The invention also provides a computer readable storage medium having stored thereon computer code which, when executed, performs a method as in any one of the preceding claims.

The invention also provides a computer program product, which when executed by a computer device, performs the method of any of the preceding claims.

The present invention also provides a computer device, comprising:

one or more processors;

a memory for storing one or more computer programs;

the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any preceding claim.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A question and answer processing method based on picture recognition comprises the following steps:

extracting corresponding retrieval information from a target picture submitted by a question and answer requesting user, wherein the target picture corresponds to a question of the user but does not comprise the question of the user, and the retrieval information comprises one or more text messages and relative position information of the text messages in the picture;

determining answer information of a question corresponding to the target picture based on question and answer information corresponding to the reference picture; the question and answer information corresponding to the reference picture comprises one or more questions matched with the reference picture and one or more answers matched with each question.

2. The method of claim 1, wherein the matching a query in a reference picture library based on the retrieved information to obtain a reference picture matching the textual information and the relative position information comprises:

matching a query in a reference picture library based on the textual information to obtain one or more candidate reference pictures, each candidate reference picture matching at least one of the one or more textual information;

determining matching degree information of the candidate reference picture and the target picture according to the text information, the relative position information and the relative position of a reference text in the candidate reference picture, wherein the reference text is matched with the text information;

and determining a reference picture matched with the text information and the relative position information in the one or more candidate reference pictures according to the matching degree information.

3. The method of claim 2, wherein the determining the matching degree information of the candidate reference picture and the target picture according to the text information and the relative position information, and the relative position of the reference text in the candidate reference picture comprises:

and determining the matching degree information of the candidate reference picture and the target picture according to whether the relative position information of the text information is consistent with the relative position of the reference text in the candidate reference picture.

4. The method of claim 2, wherein the determining the matching degree information of the candidate reference picture and the target picture according to the text information and the relative position information, and the relative position of the reference text in the candidate reference picture comprises:

if the candidate reference picture comprises a plurality of reference texts, determining matching degree information of the candidate reference picture and the target picture according to whether text sequence information corresponding to the text information is matched with text sequence information corresponding to the reference texts, wherein the text sequence information corresponding to the text information and the text sequence information corresponding to the reference texts are respectively determined by the relative position corresponding to each text.

5. The method of claim 1, wherein the determining answer information for the question corresponding to the target picture based on the question-answer information corresponding to the reference picture comprises:

if the reference picture is retrieved, determining answer information of a question corresponding to the target picture based on question-answer information corresponding to the reference picture; otherwise, sending a question description request to the question and answer requesting user, wherein the question description request is used for indicating the user to determine the question request.

6. The method of claim 1, wherein the determining answer information for the question corresponding to the target picture based on the question-answer information corresponding to the reference picture comprises:

if the question and answer information corresponding to the reference picture determines that the target picture corresponds to at least two questions, providing the at least two questions to the question and answer requesting user;

obtaining a question selected by the question-answer requesting user from at least two questions;

determining answer information of the question selected by the question-answer requesting user.

7. The method of claim 1, wherein the method further comprises:

and feeding back the answer information to the question-answer requesting user.

8. The method of claim 7, wherein the method further comprises:

feedback information submitted by the question-answer requesting user based on the answer information is obtained;

if the feedback information includes that the problem corresponding to the target picture is solved, updating the target picture into a reference picture library; alternatively, the first and second electrodes may be,

and if the feedback information includes that the problem corresponding to the target picture is not solved, marking the target picture as the problem to be solved.

9. A question-answering processing device based on picture recognition, comprising:

the question answering device comprises a first device, a second device and a third device, wherein the first device is used for extracting corresponding retrieval information from a target picture submitted by a question answering request user, the target picture corresponds to a question of the user and does not comprise the question of the user, and the retrieval information comprises one or more text messages and relative position information of the text messages in the picture;

third means for determining answer information of a question corresponding to the target picture based on question-answer information corresponding to the reference picture; the question and answer information corresponding to the reference picture comprises one or more questions matched with the reference picture and one or more answers matched with each question.

10. The apparatus of claim 9, wherein the second means comprises:

a first unit, configured to match a query in a reference picture library based on the text information to obtain one or more candidate reference pictures, each candidate reference picture being matched with at least one of the one or more text information;

a second unit, configured to determine, according to the text information and the relative position information, and a relative position of a reference text in the candidate reference picture, matching degree information between the candidate reference picture and the target picture, where the reference text is matched with the text information;

and a third unit, configured to determine, according to the matching degree information, a reference picture that matches the text information and the relative position information in the one or more candidate reference pictures.

11. The apparatus of claim 10, wherein the second means is for:

12. The apparatus of claim 10, wherein the second means is for:

13. The apparatus of claim 9, wherein the third means is for:

14. The apparatus of claim 9, wherein the third means is for:

15. The apparatus of claim 9, wherein the apparatus further comprises:

and the fourth device is used for feeding back the answer information to the question-answer requesting user.

16. The apparatus of claim 15, wherein the apparatus further comprises:

a fifth device, configured to acquire feedback information submitted by the question and answer requesting user based on the answer information;

a sixth means for updating the target picture into a reference picture library if the feedback information includes that the problem corresponding to the target picture is solved; or if the feedback information includes that the problem corresponding to the target picture is not solved, marking the target picture as the problem to be solved.

17. A computer readable storage medium storing computer code which, when executed, performs the method of any of claims 1 to 8.

18. A computer program product, the method of any one of claims 1 to 8 being performed when the computer program product is executed by a computer device.

19. A computer device, the computer device comprising:

one or more processors;

a memory for storing one or more computer programs;

the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-8.