CN111241319A

CN111241319A - Method and system for image-text conversion

Info

Publication number: CN111241319A
Application number: CN202010074440.7A
Authority: CN
Inventors: 郑文然; 文友枥; 吕金旺; 任浩男
Original assignee: Beijing Sohu New Media Information Technology Co Ltd
Current assignee: Beijing Sohu New Media Information Technology Co Ltd
Priority date: 2020-01-22
Filing date: 2020-01-22
Publication date: 2020-06-05
Anticipated expiration: 2040-01-22
Also published as: CN111241319B

Abstract

The invention provides a method and a system for image-text conversion, wherein the method comprises the following steps: acquiring a picture to be converted; acquiring keywords corresponding to the picture to be converted based on an image recognition tool; inputting the keywords into a search engine, and acquiring the correlation degree between the keywords and each sentence in a preset corpus; and displaying the sentence with the relevance reaching the threshold to the user. In the scheme, the image recognition tool is used for obtaining the keywords corresponding to the picture to be converted, and the relevancy between the keywords and each sentence in the corpus is obtained through the search engine. And displaying the statement with the correlation reaching the threshold to the user so that the user can select the descriptive statement conforming to the picture to be converted, thereby avoiding that the user cannot find the appropriate descriptive statement after uploading the picture.

Description

Method and system for image-text conversion

Technical Field

The invention relates to the technical field of image recognition, in particular to a method and a system for image-text conversion.

Background

With the development of internet technology, various social software is also continuously launched. In the process of using social software, a user generally uses a picture uploading function.

After the user uploads the picture, most of the users need to add a proper description sentence for the uploaded picture, but the user may not accurately find the description sentence which conforms to the uploaded picture due to various reasons.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and a system for image-text conversion, so as to solve a problem that a user cannot accurately find a description sentence that conforms to an uploaded image.

In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

the first aspect of the embodiment of the invention discloses a method for image-text conversion, which comprises the following steps:

acquiring a picture to be converted;

acquiring a keyword corresponding to the picture to be converted based on an image recognition tool;

inputting the keyword into a search engine, and acquiring the correlation degree between the keyword and each sentence in a preset corpus;

and displaying the sentence with the relevance reaching the threshold to the user.

Preferably, the obtaining the keyword corresponding to the picture to be converted based on the image recognition tool includes:

identifying the picture to be converted based on an image identification tool to obtain elements forming the picture to be converted;

and acquiring a keyword corresponding to each element according to the characteristic information of each element.

Preferably, the process of constructing the corpus comprises:

acquiring a plurality of sentences;

performing word segmentation processing on each sentence to obtain a word segmentation result and weight of each sentence;

and storing the word segmentation result and the weight corresponding to each sentence into a corpus corresponding to a search engine.

Preferably, the displaying the sentence with the relevance reaching the threshold to the user includes:

sorting the sentences the correlation degree of which reaches a threshold value according to the sequence of the correlation degrees from high to low;

and displaying the sentences of which the sorted relevance reaches the threshold value to a user.

scoring the sentences the relevancy of which reaches a threshold value by using the weight of each sentence to obtain the scores of the sentences the relevancy of which reaches the threshold value;

sorting the sentences the correlation degrees of which reach a threshold value according to the sequence of scores from high to low;

Preferably, before obtaining the relevance between the keyword and each sentence in the preset corpus, the method further includes:

and if the keywords are English, translating the keywords into Chinese.

A second aspect of the embodiments of the present invention discloses a system for image-text conversion, including:

the first acquisition unit is used for acquiring a picture to be converted;

the second acquisition unit is used for acquiring keywords corresponding to the picture to be converted based on the image recognition tool;

a third obtaining unit, configured to enter the keyword into a search engine, and obtain a relevance between the keyword and each sentence in a preset corpus;

and the display unit is used for displaying the sentence with the relevance reaching the threshold value to a user.

Preferably, the second acquiring unit includes:

the identification module is used for identifying the picture to be converted based on an image identification tool to obtain elements forming the picture to be converted;

and the acquisition module is used for acquiring the key words corresponding to each element according to the characteristic information of each element.

Preferably, the third acquiring unit includes:

the acquisition module is used for acquiring a plurality of sentences;

the word segmentation module is used for carrying out word segmentation processing on each sentence to obtain a word segmentation result and weight of each sentence;

and the storage module is used for storing the word segmentation result and the weight corresponding to each sentence into the corpus corresponding to the search engine.

Preferably, the display unit includes:

the processing module is used for sequencing the sentences of which the correlation reaches the threshold value according to the sequence of the correlation from high to low;

and the display module is used for displaying the sentences of which the sorted relevancy reaches the threshold value to the user.

Based on the method and the system for image-text conversion provided by the embodiment of the invention, the method comprises the following steps: acquiring a picture to be converted; acquiring keywords corresponding to the picture to be converted based on an image recognition tool; inputting the keywords into a search engine, and acquiring the correlation degree between the keywords and each sentence in a preset corpus; and displaying the sentence with the relevance reaching the threshold to the user. In the scheme, the image recognition tool is used for obtaining the keywords corresponding to the picture to be converted, and the relevancy between the keywords and each sentence in the corpus is obtained through the search engine. And displaying the statement with the correlation reaching the threshold to the user so that the user can select the descriptive statement conforming to the picture to be converted, thereby avoiding that the user cannot find the appropriate descriptive statement after uploading the picture.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a method for converting images and texts according to an embodiment of the present invention;

FIG. 2 is a flowchart of constructing a corpus according to an embodiment of the present invention;

FIG. 3 is a flowchart of displaying a statement with a threshold relevance to a user according to an embodiment of the present invention;

fig. 4 is a block diagram of a system for converting images and texts according to an embodiment of the present invention;

fig. 5 is a block diagram of another system for text-to-text conversion according to an embodiment of the present invention;

fig. 6 is a block diagram of a system for converting text and graphics provided by an embodiment of the present invention;

fig. 7 is a block diagram of a system for teletext conversion according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

As known in the background art, at present, after uploading a picture, a user usually needs to add a proper descriptive sentence to the picture, but the user may not accurately find the descriptive sentence corresponding to the uploaded picture due to various reasons.

Therefore, the embodiment of the invention provides a method and a system for image-text conversion, which utilize an image recognition tool to obtain a keyword corresponding to an image to be converted, and obtain the correlation between the keyword and each sentence in a corpus through a search engine. And displaying the sentences the correlation degree of which reaches the threshold value to the user, thereby avoiding that the user cannot find suitable descriptive sentences after uploading the picture.

It should be noted that the method and system for image-text conversion in the embodiments of the present invention are not only applicable to the social field, but also applicable to the creative fields such as advertisements, customs and posters. That is to say, in the field of text creation, after a user uploads a picture, the method for converting pictures and texts according to the embodiment of the present invention may be adopted to match corresponding sentences for the uploaded picture and display the matched sentence to the user for the user to select.

Referring to fig. 1, a flowchart of a method for text-to-text conversion according to an embodiment of the present invention is shown, where the method includes the following steps:

step S101: and acquiring a picture to be converted.

It can be understood that the obtained picture to be converted is a picture uploaded by the user, and the mode for uploading the picture to be converted by the user is as follows: the user can select and upload pictures from the photo album in the equipment terminal, and the user can also take pictures and upload the pictures by using the equipment terminal.

It should be noted that the user may also obtain and upload the picture to be converted in other manners, which is not limited to two manners of selecting the picture to be converted from the album and obtaining the picture to be converted by taking a picture, and other manners are not described in detail for example.

Step S102: and acquiring keywords corresponding to the picture to be converted based on the image recognition tool.

In the process of implementing step S102 specifically, the picture to be converted is identified based on the image identification tool, elements constituting the picture to be converted are obtained, and the keyword corresponding to each element is obtained according to the feature information of each element.

It is understood that the image recognition tool is a tool having an image recognition function, such as the Google CloudVision API.

It should be noted that the Google Cloud Vision API carries an intelligent machine learning system "TensorFlow", which can classify pictures into thousands of categories, detect the related emotion of the face in the picture, and detect the information such as the characters and elements on the picture.

In the process of identifying the picture to be converted, elements such as the theme, the color and the characteristics of the picture to be converted are identified by using an image identification tool, and the keyword of each element is acquired according to the characteristic information of the element. For example: assuming that a picture to be converted uploaded by a user is a picture of the sky, the picture to be converted is identified by using an image identification tool as the cloud, the sky, the blue, the sunny day and the like, that is, keywords obtained by identifying the picture to be converted are "cloud", "sky", "blue" and "sunny day".

It can be understood that the above examples regarding identifying the picture to be converted are only used for illustration, and when the picture to be converted includes other types of information such as face information and text information, the emotion in the face information and the text in the text information can also be identified by using the image recognition tool, and the keywords corresponding to the identified emotion and text are obtained. For example: assuming that one of the elements contained in the picture to be converted is face information of laugh, the image recognition tool can be used for recognizing that the corresponding keyword in the picture to be converted is 'happy'.

It should be noted that the languages used by the image recognition tools corresponding to different countries may be different, that is, the keywords corresponding to the to-be-converted pictures acquired by the image recognition tools do not necessarily conform to the language currently used by the user. Such as: the language used by the user at present is Chinese, and the keywords corresponding to the picture to be converted, which are acquired by the image recognition tool, are English, and then the keywords are translated into Chinese.

Another example is: and translating the keywords into the French if the language used by the user at present is the French and the keywords corresponding to the picture to be converted, which are acquired by the image recognition tool, are Chinese.

It can be understood that, if the keyword corresponding to the to-be-converted picture acquired by the image recognition tool conforms to the language currently used by the user, the keyword does not need to be translated.

It should be noted that, if the keyword needs to be translated, the open source translation word stock may be used to translate the keyword into the corresponding language.

Step S103: and inputting the keywords into a search engine, and acquiring the correlation degree between the keywords and each sentence in a preset corpus.

A large number of sentences are collected in advance, and a corpus is constructed using the collected large number of sentences.

It can be understood that, for different application fields, statements corresponding to each application field are collected, for example, for a social field, a user generally needs to describe an image uploaded by the user using a conscious and graceful statement when uploading the image. When a corpus is constructed, a large number of grace sentences need to be collected in advance, and a corresponding corpus is constructed by utilizing the large number of grace sentences.

For another example, in the field of posters, a user needs to describe the subject of the poster using a suitable sentence when uploading a picture. When a corpus is constructed, a large amount of poster expressions need to be collected in advance, and a corresponding corpus is constructed by utilizing the large amount of poster expressions.

In the process of implementing step S103, the keyword obtained in the above step is input to a search engine, and the search engine searches from a corpus constructed in advance to obtain the degree of correlation (also referred to as "relevance") between the keyword and each sentence in the corpus.

It should be noted that the indexes for obtaining the correlation between the keyword and the sentence are word frequency and word density, in a normal case, the density of the keyword and the number of times that the keyword appears in the sentence are positively correlated, and the more the number of times that the keyword appears, the higher the density is, the higher the correlation between the keyword and the sentence is.

It should be further noted that the types of search engines mentioned above include, but are not limited to, the ElasticSearch, and other search engines with functions similar to those of the ElasticSearch are also applicable to the solution according to the embodiment of the present invention.

In the process of utilizing the search engine, all keywords are input into the search engine, the search engine searches sentences in the corpus and inverses the sentences according to the correlation degree of the keywords.

Step S104: and displaying the sentence with the relevance reaching the threshold to the user.

In the process of implementing step S104 specifically, the statements whose relevancy reaches the threshold are sorted in the order from high to low, and the sorted statements whose relevancy reaches the threshold are displayed to the user for the user to select.

It is to be understood that the above sorting order may also be an order from low to high, and is not limited herein.

That is to say, the n sentences with the highest relevance to the keywords are displayed to the user, so that the user can select the corresponding sentences according to the requirement of the user, and n is an integer greater than 0.

In the embodiment of the invention, the image recognition tool is used for acquiring the keywords corresponding to the picture to be converted, and the correlation degree between the keywords and each sentence in the corpus is acquired through the search engine. And displaying the statement with the correlation reaching the threshold to the user so that the user can select the descriptive statement conforming to the picture to be converted, thereby avoiding that the user cannot find the appropriate descriptive statement after uploading the picture.

The process of constructing a corpus related to step S103 in fig. 1 in the embodiment of the present invention is shown in fig. 2, which is a flowchart of constructing a corpus provided in the embodiment of the present invention, and includes the following steps:

step S201: a plurality of statements is obtained.

In the process of implementing step S201 specifically, for different application fields, statements corresponding to the application fields are collected, and specific contents may refer to the contents in step S103 in fig. 1 in the above embodiment of the present invention, which is not described herein again.

And after the plurality of sentences are obtained, submitting the obtained plurality of sentences to a search engine, and carrying out corresponding processing on the obtained plurality of sentences.

Step S202: and performing word segmentation processing on each sentence to obtain a word segmentation result and weight of each sentence.

In the process of implementing step S202, after submitting a plurality of sentences to a search engine, performing word segmentation on each sentence by using a word segmentation controller to obtain a word segmentation result and a weight corresponding to each sentence.

Step S203: and storing the word segmentation result and the weight corresponding to each sentence into a corpus corresponding to the search engine.

In the process of implementing step S203 specifically, after each sentence is segmented, the obtained segmentation result and weight corresponding to each sentence are stored in the corpus corresponding to the search engine.

In the embodiment of the invention, the obtained multiple sentences are submitted to a search engine for word segmentation, and the obtained word segmentation result and weight of each sentence are stored in a corpus of the search engine. After the keywords in the picture to be converted are identified, the keywords are retrieved by using a search engine, and the sentences the correlation degrees of which reach the threshold value with the keywords are displayed to a user so that the user can select the description sentences which accord with the picture to be converted, thereby avoiding the situation that the user cannot find the proper description sentences after uploading the picture.

In the foregoing description, the process of displaying the statement whose correlation reaches the threshold to the user in step S104 in fig. 1 is shown in fig. 3 in combination with the content of fig. 2, and is a flowchart of displaying the statement whose correlation reaches the threshold to the user, which is provided in the embodiment of the present invention, and includes the following steps:

step S301: and scoring the sentences the relevancy of which reaches the threshold value by using the weight of each sentence to obtain the scores of the sentences the relevancy of which reaches the threshold value.

According to the content in fig. 2, the obtained sentences are subjected to word segmentation processing to obtain corresponding word segmentation results and weights. In the process of implementing step S301 specifically, for the sentences whose correlation reaches the threshold, the weights corresponding to the sentences are used to score, so as to obtain the scores of the sentences whose correlation reaches the threshold.

Step S302: and sorting the sentences with the correlation degree reaching the threshold value in the order of the scores from high to low.

In the process of implementing step S302 specifically, after scoring the sentences whose correlation reaches the threshold, the sentences whose correlation reaches the threshold are sorted according to the order of the scores.

It is understood that the ordering may be from low to high, and is not limited in this respect.

Step S303: and displaying the sentences of which the sorted relevance reaches the threshold value to the user.

In the process of implementing step S303 specifically, after the sentences whose relevancy reaches the threshold are sorted, the sorted sentences whose relevancy reaches the threshold are displayed to the user for the user to select according to his own needs.

In the embodiment of the invention, the statement with the relevance reaching the threshold is scored by using the weight of each statement. And sorting the sentences the relevancy of which reaches the threshold value according to the sequence of the scores, and displaying the sorting result to the user so that the user can select the descriptive sentences which accord with the picture to be converted, thereby avoiding that the user cannot find the appropriate descriptive sentences after uploading the picture.

Corresponding to the method for image-text conversion provided by the above embodiment of the present invention, referring to fig. 4, an embodiment of the present invention further provides a structural block diagram of a system for image-text conversion, where the system includes: a first acquisition unit 401, a second acquisition unit 402, a third acquisition unit 403, and a display unit 404;

a first obtaining unit 401, configured to obtain a picture to be converted.

A second obtaining unit 402, configured to obtain, based on the image recognition tool, a keyword corresponding to the picture to be converted.

Preferably, the second obtaining unit 402 is further configured to: if the key words are English, the key words are translated into Chinese.

A third obtaining unit 403, configured to enter the keyword into the search engine, and obtain a relevance between the keyword and each sentence in the preset corpus.

And a display unit 404, configured to display the sentence with the correlation reaching the threshold to the user.

Preferably, referring to fig. 5 in conjunction with fig. 4, a block diagram of a system for text-to-text conversion according to an embodiment of the present invention is shown, where the second obtaining unit 402 includes:

the identification module 4021 is configured to identify a picture to be converted based on an image identification tool, so as to obtain elements constituting the picture to be converted.

The obtaining module 4022 is configured to obtain a keyword corresponding to each element according to the feature information of each element.

Preferably, referring to fig. 6 in conjunction with fig. 4, a block diagram of a system for text-to-text conversion according to an embodiment of the present invention is shown, and the third obtaining unit 403: an acquisition module 4031, a word segmentation module 4032 and a storage module 4033;

an obtaining module 4031, configured to obtain multiple statements.

The word segmentation module 4032 is configured to perform word segmentation processing on each sentence to obtain a word segmentation result and a weight of each sentence.

A storage module 4033, configured to store the word segmentation result and the weight corresponding to each sentence in the corpus corresponding to the search engine.

Preferably, referring to fig. 7 in conjunction with fig. 4, a block diagram of a system for text-to-text conversion according to an embodiment of the present invention is shown, and the display unit 404 includes:

the processing module 4041 is configured to sort, according to an order from high to low of the correlation, the statements whose correlation reaches the threshold.

A display module 4042, configured to display the sorted sentences whose relevancy reaches the threshold to the user.

Preferably, in combination with the content shown in fig. 7, in another specific implementation, the processing module 4041 is configured to score the sentences of which the correlation reaches the threshold value by using the weight of each sentence, obtain the scores of the sentences of which the correlation reaches the threshold value, and sort the sentences of which the correlation reaches the threshold value in the order of the scores from high to low.

To sum up, an embodiment of the present invention provides a method and a system for image-text conversion, where the method includes: acquiring a picture to be converted; acquiring keywords corresponding to the picture to be converted based on an image recognition tool; inputting the keywords into a search engine, and acquiring the correlation degree between the keywords and each sentence in a preset corpus; and displaying the sentence with the relevance reaching the threshold to the user. In the scheme, the image recognition tool is used for obtaining the keywords corresponding to the picture to be converted, and the relevancy between the keywords and each sentence in the corpus is obtained through the search engine. And displaying the statement with the correlation reaching the threshold to the user so that the user can select the descriptive statement conforming to the picture to be converted, thereby avoiding that the user cannot find the appropriate descriptive statement after uploading the picture.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of teletext conversion, the method comprising:

acquiring a picture to be converted;

2. The method according to claim 1, wherein the obtaining of the keyword corresponding to the picture to be converted based on the image recognition tool comprises:

3. The method of claim 1, wherein constructing the corpus comprises:

acquiring a plurality of sentences;

4. The method of claim 1, wherein displaying the sentence with the relevance reaching the threshold to a user comprises:

5. The method of claim 3, wherein displaying the sentence with the relevance reaching the threshold to a user comprises:

6. The method of claim 1, wherein before obtaining the relevance between the keyword and each sentence in a predetermined corpus, the method further comprises:

and if the keywords are English, translating the keywords into Chinese.

7. A system for teletext conversion, the system comprising:

the first acquisition unit is used for acquiring a picture to be converted;

8. The system of claim 7, wherein the second obtaining unit comprises:

9. The system of claim 7, wherein the third obtaining unit comprises:

the acquisition module is used for acquiring a plurality of sentences;

10. The system of claim 7, wherein the display unit comprises: