WO2023045721A1

WO2023045721A1 - Image language identification method and related device thereof

Info

Publication number: WO2023045721A1
Application number: PCT/CN2022/116011
Authority: WO
Inventors: 毛晓飞; 黄灿; 王长虎
Original assignee: 北京有竹居网络技术有限公司
Priority date: 2021-09-27
Filing date: 2022-08-31
Publication date: 2023-03-30
Also published as: CN113822275A

Abstract

An image language identification method and a related device thereof. The method comprises: after an image to be processed is obtained, extracting N text images to be used from the image according to a text detection result of the image; then determining a language extraction feature and a visual extraction feature of the n-th text image; then determining an image extraction feature of the n-th text image according to the language extraction feature and the visual extraction feature of the n-th text image, wherein n is a positive integer, n ≤ N, and N is a positive integer; and finally determining a language identification result of the image according to the image extraction features of the N text images, so that the language identification result can accurately represent a language to which the image belongs.

Description

Image language recognition method and related equipment

This application claims the priority of the Chinese patent application with the application number 202111138638.8 and the application title "A Method for Image Language Recognition and Related Equipment" filed with the China Patent Office on September 27, 2021, the entire contents of which are incorporated by reference In this application.

technical field

The present application relates to the technical field of image processing, in particular to an image language recognition method and related equipment.

Background technique

In some application scenarios, it is necessary to determine which language an image data carrying character information belongs to. For example, if an image data contains a large number of Chinese characters, the language of the image data is Chinese; if an image data contains a large number of English words, the language of the image data is English; ….

However, how to identify the language of an image data is an urgent technical problem to be solved.

Contents of the invention

In order to solve the above technical problems, the present application provides an image language recognition method and related equipment, which can accurately identify the language to which an image data belongs.

In order to achieve the above objectives, the technical solutions provided in the embodiments of the present application are as follows:

An embodiment of the present application provides an image language recognition method, the method comprising:

After the image to be processed is acquired, according to the text detection result of the image to be processed, N text images to be used are extracted from the image to be processed; wherein, N is a positive integer;

Determine the language extraction features of the nth text image to be used and the visual extraction features of the nth text image to be used; wherein, n is a positive integer, n≤N;

According to the language extraction feature of the nth text image to be used and the visual extraction feature of the nth text image to be used, determine the image extraction feature of the nth text image to be used; wherein, n is a positive integer , n≤N;

According to the image extraction features of the N text images to be used, the language recognition result of the image to be processed is determined.

In a possible implementation manner, the visual extraction features include at least one of character density features, color distribution features, and image position features.

In a possible implementation manner, the determination process of the character density feature of the nth text image to be used includes:

Inputting the nth text image to be used into a pre-built density feature extraction model to obtain the character density feature of the nth text image to be used output by the density feature extraction model;

The process of determining the color distribution characteristics of the nth text image to be used includes:

Inputting the nth text image to be used into a pre-built color feature extraction model to obtain the color distribution characteristics of the nth text image to be used output by the color feature extraction model;

The determination process of the image location feature of the nth text image to be used includes:

Inputting the position description information of the nth text image to be used into a pre-built position feature extraction model to obtain the image position feature of the nth text image to be used output by the position feature extraction model.

In a possible implementation manner, the process of determining the language extraction feature of the nth text image to be used includes:

Inputting the nth text image to be used into a pre-built language feature extraction model to obtain the language extraction features of the nth text image to be used output by the language feature extraction model.

In a possible implementation manner, the visual extraction features include character density features, color distribution features, and image position features;

According to the language extraction feature of the nth text image to be used and the visual extraction feature of the nth text image to be used, determining the image extraction feature of the nth text image to be used includes:

The language extraction feature of the nth text image to be used, the character density feature of the nth text image to be used, the color distribution feature of the nth text image to be used, and the nth text image to be used The image position features of the text images are used for splicing to obtain the image extraction features of the nth text image to be used.

In a possible implementation manner, the determining the language recognition result of the image to be processed according to the image extraction features of the N text images to be used includes:

splicing the image extraction features of the N text images to be used to obtain language representation data of the image to be processed;

Inputting the language representation data into a pre-built image language recognition model to obtain a language recognition result of the image to be processed output by the image language recognition model.

In a possible implementation manner, the construction process of the image language recognition model includes:

Acquiring the sample image to be used and the actual language of the sample image to be used;

determining at least one sample text image and location description information of the at least one sample text image according to the text detection result of the sample image to be used;

inputting the at least one sample text image and the location description information of the at least one sample text image into the model to be trained, and obtaining the language recognition result of the sample image to be used output by the model to be trained;

According to the language recognition result of the sample image to be used and the actual language of the sample image to be used, the model to be trained is updated, and the step of combining the at least one sample text image and the at least one sample text image is continued. The step of inputting the location description information into the model to be trained until the preset stop condition is reached, and the image language recognition model is determined according to the model to be trained.

In a possible implementation, the model to be trained includes a language feature extraction network, a density feature extraction network, a color feature extraction network, a location feature extraction network, a feature splicing network, and an image language recognition network; wherein, the image The input data of the language recognition network includes the output data of the feature splicing network; the input data of the feature splicing network includes the output data of the language feature extraction network, the output data of the density feature extraction network, the color feature extraction The output data of the network, and the output data of the location feature extraction network;

According to the model to be trained, determining the image language recognition model includes:

The image language recognition network in the model to be trained is determined as the image language recognition model.

In a possible implementation manner, before inputting the at least one sample text image and the position description information of the at least one sample text image into the model to be trained, the process of constructing the image language recognition model further includes:

Using the first text image and the actual language features of the first text image to train the first model;

training a second model using the second text image and the actual density features of the second text image;

using the third text image and the actual color features of the third text image to train a third model;

Utilize the location description information of the 4th text image and the actual location feature of the 4th text image, train the 4th model;

Using the trained first model, the trained second model, the trained third model, and the trained fourth model, respectively for the language features in the model to be trained The extraction network, the density feature extraction network, the color feature extraction network, and the position feature extraction network are initialized.

The embodiment of the present application also provides an image language recognition device, including:

An image extraction unit, configured to extract N text images to be used from the image to be processed according to the text detection result of the image to be processed after acquiring the image to be processed; wherein, N is a positive integer;

A feature determination unit, configured to determine the language extraction features of the nth text image to be used and the visual extraction features of the nth text image to be used; wherein, n is a positive integer, n≤N;

A feature processing unit, configured to determine the image extraction features of the nth text image to be used according to the language extraction features of the nth text image to be used and the visual extraction features of the nth text image to be used; Among them, n is a positive integer, n≤N;

The language recognition unit is configured to determine the language recognition result of the image to be processed according to the image extraction features of the N text images to be used.

The embodiment of the present application also provides a device, the device includes a processor and a memory:

The memory is used to store computer programs;

The processor is configured to execute any implementation of the image language recognition method provided in the embodiments of the present application according to the computer program.

The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute any implementation of the image language recognition method provided in the embodiment of the present application .

The embodiment of the present application also provides a computer program product, which, when running on a terminal device, enables the terminal device to execute any implementation manner of the image language recognition method provided in the embodiment of the present application.

Compared with the prior art, the embodiment of the present application has at least the following advantages:

In the technical solution provided by the embodiment of the present application, after the image to be processed is acquired, N text images to be used are extracted from the image to be processed according to the text detection result of the image to be processed; and then the nth image to be processed is determined. Use the language extraction feature of the text image and the visual extraction feature of the nth text image to be used; wherein, n is a positive integer, n≤N, and N is a positive integer; then, according to the language of the nth text image to be used Extract features and the visual extraction features of the nth text image to be used, determine the image extraction features of the nth text image to be used; wherein, n is a positive integer, n≤N, and N is a positive integer; finally, according to N Image extraction features of a text image to be used, determine the language recognition result of the image to be processed, so that the language recognition result can accurately represent the language to which the image to be processed belongs, so that the language to which an image data belongs can be accurately identified .

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments described in this application. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

Fig. 1 is a schematic diagram of a kind of image data provided by the embodiment of the present application;

FIG. 2 is a flow chart of a method for recognizing an image language provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a text area provided by an embodiment of the present application;

Fig. 4 is a schematic structural diagram of a language feature extraction model provided by the embodiment of the present application;

FIG. 5 is a schematic structural diagram of a density feature extraction model provided in an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a color feature extraction model provided in an embodiment of the present application;

FIG. 7 is a schematic structural diagram of an image language recognition model provided by an embodiment of the present application;

FIG. 8 is a schematic structural diagram of an image language recognition device provided by an embodiment of the present application.

Detailed ways

In the research on the above-mentioned "language of image data", the inventor found that for an image data (such as the image data shown in Figure 1), the image can be determined according to the language of a large number of character information carried by the image data. The language of the data. For ease of understanding, the following description will be given in combination with examples.

As an example, although the image data shown in FIG. 1 carries character information belonging to Vietnamese and character information belonging to English at the same time, the amount of character information belonging to Vietnamese is far greater than the amount of character information belonging to English. It can be determined that the language of the image data is Vietnamese.

Based on the above findings, in order to solve the technical problems in the background technology, the embodiment of the present application provides an image language recognition method, the method includes: after acquiring the image to be processed, first according to the text detection result of the image to be processed, from Extract N text images to be used from the image to be processed; then determine the language extraction features of the nth text image to be used and the visual extraction features of the nth text image to be used; wherein, n is a positive integer, n≤ N, N is a positive integer; then, according to the language extraction feature of the nth text image to be used and the visual extraction feature of the nth text image to be used, determine the image extraction feature of the nth text image to be used; Among them, n is a positive integer, n≤N, and N is a positive integer; finally, according to the image extraction features of N text images to be used, determine the language recognition result of the image to be processed, so that the language recognition result can accurately represent The language of the image to be processed can be identified, so that the language of an image data can be accurately identified.

In addition, the embodiment of the present application does not limit the subject of execution of the image language recognition method. For example, the image language recognition method provided in the embodiment of the present application can be applied to data processing devices such as terminal devices or servers. Wherein, the terminal device may be a smart phone, a computer, a personal digital assistant (Personal Digital Assistant, PDA), or a tablet computer. The server can be an independent server, a cluster server or a cloud server.

In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

Method embodiment one

Referring to FIG. 2 , this figure is a flow chart of a method for recognizing an image language provided by an embodiment of the present application.

The image language recognition method provided in the embodiment of this application includes S1-S5:

S1: After the image to be processed is acquired, N text images to be used are extracted from the image to be processed according to the text detection result of the image to be processed. Wherein, N is a positive integer.

Wherein, the "image to be processed" refers to the image data (for example, the image data shown in Fig. 1 ) that needs image language recognition processing; and the "image to be processed" includes character information in at least one language.

The "text detection result of the image to be processed" is used to indicate the location of at least one text region in the image to be processed. For example, as shown in Figure 3, when the image to be processed is the image data shown in Figure 1, the text detection result of the image to be processed may include the position description data of the first text area, the position description data of the second text area , ... , and the location description data of the fifth text area. Among them, the "position description data of the first text area" is used to indicate the position of the first text area in the image data shown in Figure 1; the "position description data of the second text area" is used to indicate the position of the second text area The position of the region in the image data shown in Figure 1; ... (and so on); "The position description data of the fifth text region" is used to represent the location of the fifth text region in the image data shown in Figure 1 location.

It should be noted that, the implementation examples of the present application do not limit the expression manner of the above-mentioned "position description data", for example, four vertex coordinates of a text area may be used for expression.

In addition, the embodiment of the present application does not limit the determination process of the above-mentioned "text detection result of the image to be processed". Process text detection results for images.

The "text detection model" is used to perform text position detection processing on the input data of the text detection model; and the embodiment of the present application does not limit the "text detection model", and any machine learning model (for example, based on a convolutional neural network) can be used deep learning models, etc.) for implementation.

The above "text detection model" can be constructed according to the first sample image and the actual text position of the first sample image. Among them, the "actual text position of the first sample image" is used to indicate the actual positions of all text regions in the first sample image in the first sample image; and the embodiment of the present application does not limit the "first sample image The method of obtaining the actual text position of this image", for example, can be implemented by manual labeling.

The nth text image to be used is used to represent the image information carried by the nth text area in the image to be processed; and this embodiment of the present application does not limit the determination process of the "nth text image to be used", for example, when the above When the "text detection result of the image to be processed" includes the position description data of the nth text area, the image interception process can be performed on the image to be processed according to the position description data of the nth text area to obtain the nth text area to be processed. The text image is used so that the nth text image to be used includes the nth text area. Wherein, n is a positive integer, n≤N.

Based on the relevant content of S1 above, it can be seen that after the image to be processed is acquired, text detection processing can be performed on the image to be processed first, and the text detection result of the image to be processed can be obtained, so that the text detection result can represent the image to be processed The position of at least one text region in the image in the image to be processed; and then extracting at least one text image to be used from the image to be processed according to the text detection result, so that each text image to be used includes each text region, Therefore, each text image to be used can respectively represent the image information carried by each text area.

S2: Determine the language extraction feature of the nth text image to be used. Wherein, n is a positive integer, n≤N.

Wherein, the "language extraction feature of the nth text image to be used" is used to represent the language information carried in the nth text region in the image to be processed.

In addition, the embodiment of the present application does not limit the implementation of S2. For example, it may specifically include: inputting the nth text image to be used into a pre-built language feature extraction model, and obtaining the nth to-be-used text image output by the language feature extraction model. Features are extracted using the language of the text image.

The above-mentioned "language feature extraction model" is used to perform language feature extraction processing on the input data of the language feature extraction model; and the embodiment of the present application does not limit the "language feature extraction model", for example, any machine learning model ( For example, a deep learning model of a neural network based on self-attention learning, etc.) is implemented.

In addition, the above "language feature extraction model" may be constructed according to the first text image and the actual language features of the first text image. Wherein, the "actual language feature of the first text image" is used to indicate the language information actually carried by the first text image; and this embodiment of the present application does not limit the acquisition method of the "actual language feature of the first text image", for example, It can be implemented by manual labeling.

It should be noted that the embodiment of the present application does not limit the construction process of the above "language feature extraction model", for example, any existing or future machine learning model construction method can be used for implementation. As another example, the model construction method shown in the second method embodiment can be used for implementation.

In addition, in order to improve the extraction effect of language features, the embodiment of the present application also provides a possible implementation of the "language feature extraction model", which may specifically include: an image feature extraction layer, a position coding layer, and a first feature fusion layer , the first feature encoding layer and the first linear processing layer. Wherein, the input data of the first linear processing layer includes the output data of the first feature coding layer; the input data of the first feature coding layer includes the output data of the first feature fusion layer; the input data of the first feature fusion layer includes image feature extraction The output data of the layer and the output data of the position encoding layer.

The above-mentioned "image feature extraction layer" is used to perform image feature extraction processing for a text image data (for example, the nth text image to be used); and the embodiment of the present application does not limit the implementation of the "image feature extraction layer", for example , which can be specifically implemented using a convolutional neural network (Convolutional Neural Networks, CNN) shown in FIG. 4 .

The above-mentioned "position coding layer" is used to perform position coding processing for a text image data; and the embodiment of the present application does not limit the implementation of the "position coding layer", for example, it can adopt any position coding processing method (for example, The Positional Encoding module in the transformer model) is implemented.

The above-mentioned "first feature fusion layer" is used to perform feature fusion processing (for example, the summation processing shown in Figure 4) for the input data of the first feature fusion layer; and the embodiment of the present application does not limit the "first feature fusion layer", for example, it can be implemented using any feature fusion processing method (for example, the feature fusion processing method involved in the transformer model).

The above-mentioned "first feature encoding layer" is used to perform encoding processing on the input data of the first feature encoding layer; and the embodiment of the present application does not limit the "first feature encoding layer", for example, it can adopt the L ₁ first encoding network (eg, the Encoder module in the transformer model) is implemented. Wherein, L ₁ is a positive integer.

The above-mentioned "first linear processing layer" is used to perform linear processing on the input data of the first linear processing layer; and the embodiment of the present application does not limit the implementation of the "first linear processing layer", for example, any linear processing layer can be used processing methods (for example, the linear module in the transformer model) for implementation.

It should be noted that, for the language feature extraction model shown in Figure 4, "CNN" is used to represent the above-mentioned "image feature extraction layer";"PositionalEncoding" is used to represent the above-mentioned "position encoding layer";"+" is used to represent Yu means the above "first feature fusion layer";"Multi-headattention" refers to the multi-head self-attention network; "ADD&norm" refers to feature addition processing and feature normalization processing; "Feed forward" refers to feedforward neural network; "L ₁ " indicates the number of the first encoding network.

Based on the relevant content of the above S2, after the nth text image to be used is acquired, language feature extraction processing can be performed on the nth text image to be used to obtain the language extraction feature of the nth text image to be used, To enable the language extraction feature to represent the language information carried by the nth text region in the image to be processed. Wherein, n is a positive integer, n≤N.

S3: Determine the visual extraction features of the nth text image to be used. Wherein, n is a positive integer, n≤N.

Among them, the "visual extraction feature of the nth text image to be used" is used to represent the image feature information carried by the nth text area in the image to be processed (for example, character density, color distribution, position distribution in the image to be processed wait).

In addition, the embodiment of the present application does not limit the above "visual extraction features of the nth text image to be used", for example, it may specifically include the character density feature of the nth text image to be used, the nth text image to be used At least one of the color distribution feature of and the image position feature of the nth text image to be used.

The "character density feature of the nth text image to be used" is used to represent the distribution density of characters in the nth text image to be used; and the embodiment of the present application does not limit the "character density feature of the nth text image to be used" For example, it may specifically include: inputting the nth text image to be used into a pre-built density feature extraction model, and obtaining the character density feature of the nth text image to be used output by the density feature extraction model.

The above "density feature extraction model" is used to perform character density feature extraction processing on the input data of the density feature extraction model; and the embodiment of the present application does not limit the "density feature extraction model", for example, any machine learning model can be used (For example, a deep learning model of a neural network based on self-attention learning, etc.) is implemented.

In addition, the above "density feature extraction model" may be constructed according to the second text image and the actual density features of the second text image. Wherein, the "actual density feature of the second text image" is used to represent the actual character distribution density in the second text image; and this embodiment of the present application does not limit the acquisition method of the "actual density feature of the second text image", for example, It can be implemented by manual labeling.

It should be noted that the embodiment of the present application does not limit the construction process of the above "density feature extraction model", for example, any existing or future machine learning model construction method can be used for implementation. As another example, the model construction method shown in the second method embodiment can be used for implementation. In addition, this embodiment of the present application does not limit the relationship between the above-mentioned "second text image" and the above-mentioned "first text image". For example, the two may refer to the same text image data, or they may refer to different text images. image data.

In addition, in order to improve the extraction effect of character density features, the embodiment of the present application also provides a possible implementation of the "density feature extraction model", which may specifically include: an image feature extraction layer and L ₂ second encoding networks. Wherein, L ₂ is a positive integer, and the embodiment of the present application does not limit L ₂ , for example, as shown in FIG. 5 , L ₂ may specifically be 4.

It should be noted that, for the relevant content of the "image feature extraction layer", please refer to the relevant content of the above "image feature extraction layer". In addition, the embodiment of the present application does not limit the above "second encoding network", and any existing or future encoding network (for example, the Encoder module in the transformer model, the Encoder module in the conformer model, etc.) can be used for implementation.

The above "color distribution feature of the nth text image to be used" is used to represent the color distribution state in the nth text image to be used (especially, the difference between the character color and the background color); and the embodiment of the present application The determination process of the "color distribution feature of the nth text image to be used" is not limited, for example, it may specifically include: inputting the nth text image to be used into a pre-built color feature extraction model to obtain the color feature extraction The color distribution feature of the nth to-be-used text image output by the model.

The above-mentioned "color feature extraction model" is used to perform color distribution feature extraction processing on the input data of the color feature extraction model; and the embodiment of the present application does not limit the "color feature extraction model", for example, any machine learning model can be used (For example, a deep learning model based on a convolutional neural network, etc.) is implemented.

In addition, the above "color feature extraction model" may be constructed according to the third text image and the actual color features of the third text image. Wherein, the "actual color feature of the third text image" is used to represent the actual color distribution state in the third text image; and the embodiment of the present application does not limit the acquisition method of the "actual color feature of the third text image", for example, It can be implemented by manual labeling.

It should be noted that the embodiment of the present application does not limit the construction process of the above "color feature extraction model", for example, any existing or future machine learning model construction method can be used for implementation. As another example, the model construction method shown in the second method embodiment can be used for implementation. In addition, this embodiment of the present application does not limit the relationship between the above-mentioned "third text image", the above-mentioned "second text image", and the above-mentioned "first text image". For example, the three may refer to the same text image Data can also refer to different text image data.

In addition, in order to improve the extraction effect of color distribution features, the embodiment of the present application also provides a possible implementation of the "color feature extraction model", which may specifically include: an image feature extraction layer and L ₃ second encoding networks. Wherein, L ₃ is a positive integer, and the embodiment of the present application does not limit L ₃ , for example, as shown in FIG. 6 , L ₃ may specifically be 2.

It should be noted that, for the relevant content of the "image feature extraction layer", please refer to the relevant content of the above "image feature extraction layer"; for the relevant content of the "second encoding network", please refer to the relevant content of the above "second encoding network" .

The above-mentioned "image position feature of the nth text image to be used" refers to the position distribution state of the character information in the nth text image to be used in the above-mentioned "image to be processed"; and the embodiment of the present application does not limit the The determination process of "the image location feature of the nth text image to be used", for example, may specifically include: inputting the location description information of the nth text image to be used into a pre-built location feature extraction model to obtain the location feature The image position feature of the nth text image to be used output by the model is extracted.

Among them, the "position description information of the nth text image to be used" is used to describe the position of the character information in the nth text image to be used in the above "image to be processed"; The determination process of the position description information of n text images to be used", for example, when the above-mentioned "nth text image to be used" is used to represent the image information carried by the nth text area in the image to be processed, then the The position description information of the nth text region in the image to be processed is determined as the position description information of the nth text image to be used.

The above-mentioned "position feature extraction model" is used to perform image position feature extraction processing on the input data of the position feature extraction model; and the embodiment of the present application does not limit the "position feature extraction model", for example, any machine learning model can be used (For example, a machine learning model based on a fully connected layer, etc.) is implemented. As another example, the "location feature extraction model" may include two fully connected layers.

In addition, the above "location feature extraction model" may be constructed according to the location description information of the fourth text image and the actual location characteristics of the fourth text image.

Among them, the "position description information of the fourth text image" is used to indicate the position of the character information in the fourth text image in the sample image to be processed; dealt with.

The "actual position feature of the fourth text image" is used to indicate the actual position distribution state of the character information in the fourth text image in the sample image to be processed; and the embodiment of the present application does not limit the "fourth text image's The method of obtaining "actual location features", for example, can be implemented by manual labeling.

It should be noted that the embodiment of the present application does not limit the construction process of the above "location feature extraction model", for example, any existing or future machine learning model construction method can be used for implementation. As another example, the model construction method shown in the second method embodiment can be used for implementation. In addition, this embodiment of the present application does not limit the relationship between the above-mentioned "fourth text image", "third text image", "second text image", and the above-mentioned "first text image". For example, the four can be refers to the same text image data, or may refer to different text image data.

Based on the relevant content of the above S3, after the nth text image to be used is obtained, preset visual feature extraction processing (for example, character density feature extraction processing, color distribution feature extraction, etc.) can be performed on the nth text image to be used. processing, and image position feature extraction processing, etc.), to obtain the visual extraction feature of the nth text image to be used, so that the visual extraction feature can represent the image feature information (for example, character density, color, etc.) carried by the image to be processed. distribution, position distribution in the image to be processed, etc.). Wherein, n is a positive integer, n≤N.

S4: According to the language extraction feature of the nth text image to be used and the visual extraction feature of the nth text image to be used, determine the image extraction feature of the nth text image to be used. Wherein, n is a positive integer, n≤N.

Among them, the "image extraction feature of the nth text image to be used" is used to represent the image information carried by the nth text image to be used (for example, character language, character density, color distribution, position in the image to be processed Distribution and other information), so that the "image extraction feature of the nth text image to be used" can accurately represent the image information carried by the nth text area in the image to be processed.

In addition, the embodiment of the present application does not limit the implementation of S4. For example, when the above-mentioned "visual extraction features" include character density features, color distribution features, and image position features, S4 may specifically include: taking the nth text image to be used The language extraction feature of the nth text image to be used, the character density feature of the nth text image to be used, the color distribution feature of the nth text image to be used, and the image position feature of the nth text image to be used are spliced to obtain the nth text image to be used Image extraction features of n text images to be used.

It should be noted that this embodiment of the present application does not limit the implementation of the above "stitching". For example, when the language extraction feature of the nth text image to be used, the character density feature of the nth text image to be used, the nth text image When the color distribution feature of the text image to be used and the image position feature of the nth text image to be used are both 1 × 512 feature vectors, then the image extraction feature of the nth text image to be used can be 4 × 512 eigenvectors.

Based on the relevant content of S4 above, after obtaining the language extraction features of the nth text image to be used and the visual extraction features of the nth text image to be used, the nth text image can be determined by referring to the above two extraction features. The image extraction feature of the text image to be used, so that the image extraction feature can accurately represent the image information carried by the nth text area in the image to be processed (for example, character language, character density, color distribution, in the image to be processed process information such as position distribution in the image).

S5: Determine the language recognition result of the image to be processed according to the image extraction features of the N text images to be used.

Among them, the "language recognition result of the image to be processed" is used to indicate the language to which the image to be processed belongs, so that the "language recognition result of the image to be processed" can accurately represent the language to which most of the character information in the image to be processed belongs ( For example, Vietnamese as shown in Figure 1).

In addition, the embodiment of this application does not limit the implementation of S5, for example, it may specifically include S51-S52:

S51: Concatenate the image extraction features of the N text images to be used to obtain language representation data of the image to be processed.

Wherein, the "language representation data of the image to be processed" is used to represent the distribution characteristics of at least one language in the image to be processed (eg, distribution range, distribution location, etc.).

In addition, this embodiment of the present application does not limit the implementation of "stitching" in S51. For example, when the image extraction feature of the nth text image to be used is a 4×512 feature vector, the language representation data of the image to be processed can be N x 4 x 512 feature vectors.

It should be noted that the above “1×512”, “4×512”, and “N×4×512” all refer to the data dimension of one feature vector.

S52: Input the language representation data of the image to be processed into the pre-built image language recognition model, and obtain the language recognition result of the image to be processed output by the image language recognition model.

Wherein, the "image language recognition model" is used to perform language recognition processing on the input data of the image language recognition model; and the embodiment of the present application does not limit the "image language recognition model", for example, any machine learning model ( For example, a deep learning model of a neural network based on self-attention learning, etc.) is implemented.

In addition, in order to improve the recognition effect of the image language, the embodiment of the present application also provides a possible implementation of the "image language recognition model", which may specifically include: L ₄ second encoding networks, a second linear processing layer, and recognition layer. Wherein, L ₄ is a positive integer, and the embodiment of the present application does not limit L ₄ , for example, as shown in FIG. 7 , L ₄ may specifically be 6.

The above-mentioned "second linear processing layer" is used to perform linear processing on the input data of the second linear processing layer; and the embodiment of the present application does not limit the implementation of the "second linear processing layer", for example, any linear processing layer can be used processing methods (for example, the linear module in the transformer model) for implementation.

The above "recognition layer" is used to perform language classification processing for the input data of the recognition layer; and the embodiment of the present application does not limit the implementation of the "recognition layer", for example, any classification method (for example, in the transformer model softmax module) for implementation.

It should be noted that for the relevant content of the "second coding network", please refer to the relevant content of the "second coding network" above.

In addition, the embodiment of the present application does not limit the construction process of the above "image language recognition model". For example, the "image language recognition model" can be constructed according to the language representation data of the sample image to be used and the actual language of the sample image to be used. Wherein, the determination process of the "language representation data of the sample image to be used" is similar to the determination process of the above-mentioned "language representation data of the image to be processed". The "actual language of the sample image to be used" is used to indicate the actual language of the sample image to be used. As another example, it can be implemented by using the model building process shown in the second method embodiment .

Based on the relevant content of the above S51 to S52, it can be known that after obtaining the image extraction features of the first text image to be used, the image extraction features of the second text image to be used, ..., and the image of the Nth text image to be used After the features are extracted, the image extraction features of the N text images to be used can be spliced first to obtain the language representation data of the image to be processed; and then the language recognition processing is performed on the language representation data to obtain the language recognition result of the image to be processed , so that the language recognition result can represent the language to which most of the character information in the image to be processed belongs (for example, Vietnamese as shown in FIG. 1 ).

Based on the relevant content of the above S1 to S5, it can be seen that for the image language recognition method provided in the embodiment of the present application, after the image to be processed is acquired, the text detection result of the image to be processed is firstly extracted from the image to be processed. N text images to be used; then determine the language extraction features of the nth text image to be used and the visual extraction features of the nth text image to be used; wherein, n is a positive integer, n≤N, and N is a positive integer ; Then, according to the language extraction feature of the nth text image to be used and the visual extraction feature of the nth text image to be used, determine the image extraction feature of the nth text image to be used; wherein, n is a positive integer , n≤N, N is a positive integer; finally, according to the image extraction features of the N text images to be used, the language recognition result of the image to be processed is determined, so that the language recognition result can accurately indicate that the image to be processed belongs to language, so that the language of an image data can be accurately identified.

Method embodiment two

In order to improve the recognition effect of image language, the embodiment of the present application also provides a model construction method, which may specifically include steps 11-step 16:

Step 11: Obtain the sample image to be used and the actual language of the sample image to be used.

Wherein, the "sample image to be used" refers to the image data required for the model building process; and the "sample image to be used" may include character information in at least one language.

The "actual language of the sample image to be used" is used to indicate the actual language of the sample image to be used; and the embodiment of the present application does not limit the acquisition method of the "actual language of the sample image to be used", for example, it can be marked manually Get it.

Step 12: Determine at least one sample text image and position description information of the at least one sample text image according to the text detection result of the sample image to be used.

Wherein, the "text detection result of the sample image to be used" is used to indicate the location of at least one text region in the sample image to be used; and the determination of the "text detection result of the sample image to be used" The process is similar to the determination process of the "text detection result of the image to be processed" above.

In addition, the determination process of the above "sample text image" is similar to the determination process of the above "text image to be used"; and the determination process of "position description information of the sample text image" is similar to the above "position description of the text image to be used Information" determination process.

Step 13: Input at least one sample text image and the location description information of the at least one sample text image into the model to be trained, and obtain the language recognition result of the sample image to be used output by the model to be trained.

Wherein, the "model to be trained" is used to perform image language recognition processing on the input data of the model to be trained.

In addition, the embodiment of the present application does not limit the "model to be trained", for example, it may specifically include: language feature extraction network, density feature extraction network, color feature extraction network, location feature extraction network, feature splicing network, and image language recognition network. Among them, the input data of the image language recognition network includes the output data of the feature splicing network; the input data of the feature splicing network includes the output data of the language feature extraction network, the output data of the density feature extraction network, the output data of the color feature extraction network, and the position The output data of the feature extraction network.

The above-mentioned "language feature extraction network" is used to perform language feature extraction processing for text image data (for example, each sample text image); and the embodiment of the present application does not limit the network structure of the "language feature extraction network", for example, it can use Implement the model structure of the above "language feature extraction model".

The above-mentioned "density feature extraction network" is used to perform character density feature extraction processing for text image data (for example, each sample text image); and the embodiment of the present application does not limit the network structure of the "density feature extraction network", for example, it can The model structure of the "Density Feature Extraction Model" above is used for implementation.

The above-mentioned "color feature extraction network" is used to perform color distribution feature extraction processing for text image data (for example, each sample text image); and the embodiment of the present application does not limit the network structure of the "color feature extraction network", for example, it can The model structure of the above "color feature extraction model" is used for implementation.

The above-mentioned "position feature extraction network" is used to perform image position feature extraction processing on the position description information of text image data (for example, each sample text image); and the embodiment of the present application does not limit the network structure of the "position feature extraction network", For example, it can be implemented using the model structure of the above "location feature extraction model".

The above-mentioned "feature splicing network" is used to concatenate the input data of the feature splicing network; and the embodiment of the present application does not limit the working principle of the "feature splicing network".

As an example, when the above-mentioned "at least one sample text image" includes K sample text images, the working principle of the "feature mosaic network" may specifically include: first extracting features of the language of the k-th sample text image, the k-th sample text image The character density feature of the sample text image, the color distribution feature of the k th sample text image, and the image position feature of the k th sample text image are spliced to obtain the image extraction feature of the k th sample text image; k is A positive integer, k≤K, K is a positive integer; then the image extraction features of the first sample text image are spliced to the image extraction features of the K sample text image to obtain the language representation data of the sample image to be used (that is, , the output of the "feature stitching network" above).

The above-mentioned "image language recognition network" is used to perform language recognition processing on the input data of the image language recognition network; and the embodiment of the present application does not limit the network structure of the "image language recognition network", for example, it can adopt the above "image language recognition network" The model structure of "Language Recognition Model" is implemented.

Based on the relevant content of the above step 13, after obtaining at least one sample text image and the position description information of the at least one sample text image, the at least one sample text image and its position description information can be input into the model to be trained, so that The model to be trained refers to the at least one sample text image and its location description information to perform image language recognition processing, and obtains and outputs a language recognition result of the sample image to be used.

Step 14: Judging whether the preset stop condition is met, if yes, execute step 16; if not, execute step 15.

Among them, the "preset stop condition" can be preset; and the embodiment of the present application does not limit the "preset stop condition", for example, it can specifically be that the loss value of the model to be trained is lower than the first threshold; it can also be the The rate of change of the loss value of the model to be trained is lower than the second threshold (that is, the image language recognition performance of the model to be trained reaches convergence), and the number of updates of the model to be trained may also reach a third threshold.

The above "loss value of the model to be trained" is used to represent the image language recognition performance of the model to be trained; and the embodiment of the present application does not limit the determination process of the "loss value of the model to be trained", existing or future emerging Any method for determining the model loss value is implemented.

Step 15: Update the model to be trained according to the language recognition result of the sample image to be used and the actual language of the sample image to be used, and return to step 13.

In the embodiment of the present application, after it is determined that the current round of the model to be trained does not meet the preset stop condition, it can be determined that the image language recognition performance of the model to be trained is still relatively poor, so it can be based on the language recognition result of the sample image to be used and The difference between the actual languages of the sample images to be used is to update the model to be trained so that the updated model to be trained can have better image language recognition performance; then continue to perform step 13 based on the updated model to be trained and subsequent steps to implement a new round of training process for the model to be trained.

It should be noted that the embodiment of the present application does not limit the update process of the model to be trained, and any existing or future model update method can be used for implementation.

Step 16: Determine the image language recognition model according to the model to be trained.

In the embodiment of the present application, after it is determined that the model to be trained in the current round has reached the preset stop condition, it can be determined that the model to be trained has better image language recognition performance, so the image language recognition model can be determined according to the model to be trained (For example, the image language recognition network in the model to be trained can be directly determined as the image language recognition model).

In addition, in some application scenarios, other models (for example, language feature extraction model, density feature extraction model, color feature extraction model, and location feature extraction model, etc.) can also be determined by using the trained model to be trained. Based on this, it can be known that step 16 may specifically include: determining the language feature extraction network in the model to be trained as the language feature extraction model; determining the density feature extraction network in the model to be trained as the density feature extraction model; extracting the color feature in the model to be trained The network is determined as the color feature extraction model; the position feature extraction network in the model to be trained is determined as the position feature extraction model; the image language recognition network in the model to be trained is determined as the image language recognition model.

Based on the relevant content of the above steps 11 to 16, in some cases, a language feature extraction model, density feature extraction model, color feature extraction model, location feature extraction model, and image language recognition can be constructed by means of a model training process model, so that the image language recognition method implemented based on these five models has a better image language recognition effect.

In addition, in order to further improve the effect of model building, the embodiment of the present application also provides another possible implementation of the model building method. In this embodiment, in addition to the above steps 11 to 16, the model building method may also include Including Step 17-Step 21:

Step 17: Using the first text image and the actual language features of the first text image, train the first model, so that the trained first model has a better language feature extraction effect.

Step 18: Using the second text image and the actual density features of the second text image to train the second model, so that the trained second model has a better character density feature extraction effect.

Step 19: using the third text image and the actual color features of the third text image to train the third model, so that the trained third model has better color distribution feature extraction effect.

Step 20: Using the position description information of the fourth text image and the actual position feature of the fourth text image, train the fourth model, so that the trained fourth model has a better image position feature extraction effect.

Step 21: Use the trained first model, trained second model, trained third model, and trained fourth model to treat the language feature extraction network, density feature extraction network, and color feature in the training model respectively. The extraction network and the location feature extraction network are initialized.

It should be noted that the embodiment of the present application does not limit the specific implementation of step 21. For example, it may specifically include: determining the trained first model as the initialization processing result of the language feature extraction network in the model to be trained; The second model is determined to be the initialization processing result of the density feature extraction network in the model to be trained; the trained third model is determined to be the initialization processing result of the color feature extraction network in the model to be trained; the trained fourth model is determined to be The initialization processing result of the location feature extraction network in the model to be trained.

Based on the relevant content of the above steps 17 to 21, it can be known that in some cases, the first to fifth models can be trained respectively; Initialize the extraction network, density feature extraction network, color feature extraction network, and position feature extraction network to obtain an initialized model to be trained; then, use the above steps 11 to 15 to train the initialized model to be trained processing to obtain a trained model to be trained; finally, a language feature extraction model, a density feature extraction model, a color feature extraction model, a location feature extraction model, and an image language recognition model are determined from the trained model to be trained.

Based on the image language recognition method provided by the above method embodiment, the embodiment of the present application also provides an image language recognition device, which will be explained and described below with reference to the accompanying drawings.

Device embodiment

For the technical details of the image language recognition device provided by the device embodiment, please refer to the above method embodiment.

Refer to FIG. 8 , which is a schematic structural diagram of an image language recognition device provided by an embodiment of the present application.

The image language recognition device 800 provided in the embodiment of the present application includes:

An image extraction unit 801, configured to extract N text images to be used from the image to be processed according to the text detection result of the image to be processed after acquiring the image to be processed; wherein, N is a positive integer;

A feature determining unit 802, configured to determine the language extraction feature of the nth text image to be used and the visual extraction feature of the nth text image to be used; wherein, n is a positive integer, n≤N;

A feature processing unit 803, configured to determine the image extraction feature of the nth text image to be used according to the language extraction feature of the nth text image to be used and the visual extraction feature of the nth text image to be used ; Among them, n is a positive integer, n≤N;

The language recognition unit 804 is configured to determine the language recognition result of the image to be processed according to the image extraction features of the N text images to be used.

In a possible implementation manner, the feature determining unit 802 includes:

The first determination subunit is configured to input the nth text image to be used into a pre-built density feature extraction model, and obtain the character density feature of the nth text image to be used output by the density feature extraction model;

The second determination subunit is configured to input the nth text image to be used into a pre-built color feature extraction model, and obtain the color distribution characteristics of the nth text image to be used output by the color feature extraction model;

The third determination subunit is configured to input the position description information of the nth text image to be used into a pre-built position feature extraction model, and obtain the output of the nth text image to be used outputted by the position feature extraction model Image location features.

In a possible implementation manner, the feature determining unit 802 includes:

The fourth determining subunit is configured to input the nth text image to be used into a pre-built language feature extraction model, and obtain the language extraction features of the nth text image to be used output by the language feature extraction model.

The feature processing unit 803 is specifically configured to: extract the language feature of the nth text image to be used, the character density feature of the nth text image to be used, the character density feature of the nth text image to be used, The color distribution feature and the image position feature of the nth text image to be used are spliced to obtain the image extraction feature of the nth text image to be used.

In a possible implementation manner, the language identification unit 804 is specifically configured to: concatenate the image extraction features of the N text images to be used to obtain language representation data of the image to be processed; The language representation data is input into the pre-built image language recognition model, and the language recognition result of the image to be processed outputted by the image language recognition model is obtained.

In a possible implementation manner, the image language recognition device 800 further includes:

A model training unit, configured to acquire the sample image to be used and the actual language of the sample image to be used; determine at least one sample text image and the position of the at least one sample text image according to the text detection result of the sample image to be used Descriptive information; input the at least one sample text image and the position description information of the at least one sample text image into the model to be trained, and obtain the language recognition result of the sample image to be used output by the model to be trained; according to the The language recognition result of the sample image to be used and the actual language of the sample image to be used are updated to the model to be trained, and the description of the position of the at least one sample text image and the at least one sample text image is continued. The step of inputting the information into the model to be trained until the preset stop condition is reached, and then determining the image language recognition model according to the model to be trained.

The process of determining the image language recognition model includes: determining the image language recognition network in the model to be trained as the image language recognition model.

A model initialization unit, configured to use the first text image and the actual language features of the first text image to train the first model; use the second text image and the actual density features of the second text image to train the second model; Utilize the third text image and the actual color feature of the third text image to train the third model; utilize the position description information of the fourth text image and the actual position feature of the fourth text image to train the fourth model; use the training The first model that has been trained, the second model that has been trained, the third model that has been trained, and the fourth model that has been trained are respectively for the language feature extraction network in the model to be trained , the density feature extraction network, the color feature extraction network, and the location feature extraction network are initialized.

Based on the relevant content of the above-mentioned image language recognition device 200, it can be seen that for the image language recognition device 200, after acquiring the image to be processed, it first extracts N texts to be processed from the image to be processed according to the text detection results of the image to be processed. Use a text image; then determine the language extraction feature of the nth text image to be used and the visual extraction feature of the nth text image to be used; wherein, n is a positive integer, n≤N, and N is a positive integer; then, According to the language extraction feature of the nth text image to be used and the visual extraction feature of the nth text image to be used, determine the image extraction feature of the nth text image to be used; wherein, n is a positive integer, n≤ N, N is a positive integer; finally, according to the image extraction features of the N text images to be used, determine the language recognition result of the image to be processed, so that the language recognition result can accurately represent the language of the image to be processed, so It is possible to accurately identify the language to which an image data belongs.

Further, the embodiment of the present application also provides a device, the device includes a processor and a memory:

The memory is used to store computer programs;

Further, the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute any of the image language recognition methods provided in the embodiment of the present application. One embodiment.

Furthermore, the embodiment of the present application also provides a computer program product, which, when running on the terminal device, enables the terminal device to execute any implementation manner of the image language recognition method provided in the embodiment of the present application.

It should be understood that in this application, "at least one (item)" means one or more, and "multiple" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, "A and/or B" can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c ", where a, b, c can be single or multiple.

The above descriptions are only preferred embodiments of the present invention, and do not limit the present invention in any form. Although the present invention has been disclosed above with preferred embodiments, it is not intended to limit the present invention. Any person familiar with the art, without departing from the scope of the technical solution of the present invention, can use the methods and technical content disclosed above to make many possible changes and modifications to the technical solution of the present invention, or modify it into an equivalent of equivalent change Example. Therefore, any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention, which do not deviate from the technical solution of the present invention, still fall within the protection scope of the technical solution of the present invention.

Claims

An image language recognition method, the method comprising:

After the image to be processed is acquired, according to the text detection result of the image to be processed, N text images to be used are extracted from the image to be processed; wherein, N is a positive integer;

Determine the language extraction features of the nth text image to be used and the visual extraction features of the nth text image to be used; wherein, n is a positive integer, n≤N;

According to the language extraction feature of the nth text image to be used and the visual extraction feature of the nth text image to be used, determine the image extraction feature of the nth text image to be used; wherein, n is a positive integer , n≤N;

According to the image extraction features of the N text images to be used, the language recognition result of the image to be processed is determined.
The method of claim 1, wherein the visually extracted features include at least one of character density features, color distribution features, and image location features.
The method according to claim 2, wherein the determination process of the character density feature of the nth text image to be used comprises:

Inputting the nth text image to be used into a pre-built density feature extraction model to obtain the character density feature of the nth text image to be used output by the density feature extraction model;

The process of determining the color distribution characteristics of the nth text image to be used includes:

Inputting the nth text image to be used into a pre-built color feature extraction model to obtain the color distribution characteristics of the nth text image to be used output by the color feature extraction model;

The determination process of the image location feature of the nth text image to be used includes:

Inputting the position description information of the nth text image to be used into a pre-built position feature extraction model to obtain the image position feature of the nth text image to be used output by the position feature extraction model.
The method according to claim 1, wherein the determination process of the language extraction feature of the nth text image to be used comprises:

Inputting the nth text image to be used into a pre-built language feature extraction model to obtain the language extraction features of the nth text image to be used output by the language feature extraction model.
The method according to claim 1, wherein said visual extraction features include character density features, color distribution features, and image location features;

According to the language extraction feature of the nth text image to be used and the visual extraction feature of the nth text image to be used, determining the image extraction feature of the nth text image to be used includes:

The language extraction feature of the nth text image to be used, the character density feature of the nth text image to be used, the color distribution feature of the nth text image to be used, and the nth text image to be used The image position features of the text images are used for splicing to obtain the image extraction features of the nth text image to be used.
The method according to claim 1, wherein the determining the language recognition result of the image to be processed according to the image extraction features of the N text images to be used comprises:

splicing the image extraction features of the N text images to be used to obtain language representation data of the image to be processed;

Inputting the language representation data into a pre-built image language recognition model to obtain a language recognition result of the image to be processed output by the image language recognition model.
The method according to claim 6, wherein, the construction process of the image language recognition model comprises:

Acquiring the sample image to be used and the actual language of the sample image to be used;

determining at least one sample text image and location description information of the at least one sample text image according to the text detection result of the sample image to be used;

inputting the at least one sample text image and the location description information of the at least one sample text image into the model to be trained, and obtaining the language recognition result of the sample image to be used output by the model to be trained;

According to the language recognition result of the sample image to be used and the actual language of the sample image to be used, the model to be trained is updated, and the step of combining the at least one sample text image and the at least one sample text image is continued. The step of inputting the location description information into the model to be trained until the preset stop condition is reached, and the image language recognition model is determined according to the model to be trained.
The method according to claim 7, wherein the model to be trained includes a language feature extraction network, a density feature extraction network, a color feature extraction network, a position feature extraction network, a feature splicing network, and an image language recognition network; wherein, the The input data of the image language recognition network includes the output data of the feature splicing network; the input data of the feature splicing network includes the output data of the language feature extraction network, the output data of the density feature extraction network, the color The output data of the feature extraction network, and the output data of the location feature extraction network;

According to the model to be trained, determining the image language recognition model includes:

The image language recognition network in the model to be trained is determined as the image language recognition model.
The method according to claim 8, wherein, before inputting the at least one sample text image and the position description information of the at least one sample text image into the model to be trained, the construction process of the image language recognition model further include:

Using the first text image and the actual language features of the first text image to train the first model;

training a second model using the second text image and the actual density features of the second text image;

using the third text image and the actual color features of the third text image to train a third model;

Using the location description information of the fourth text image and the actual location features of the fourth text image to train a fourth model;

Using the trained first model, the trained second model, the trained third model, and the trained fourth model, respectively for the language features in the model to be trained The extraction network, the density feature extraction network, the color feature extraction network, and the position feature extraction network are initialized.
An image language recognition device, comprising:

An image extraction unit, configured to extract N text images to be used from the image to be processed according to the text detection result of the image to be processed after acquiring the image to be processed; wherein, N is a positive integer;

A feature determination unit, configured to determine the language extraction features of the nth text image to be used and the visual extraction features of the nth text image to be used; wherein, n is a positive integer, n≤N;

A feature processing unit, configured to determine the image extraction features of the nth text image to be used according to the language extraction features of the nth text image to be used and the visual extraction features of the nth text image to be used; Among them, n is a positive integer, n≤N;

The language recognition unit is configured to determine the language recognition result of the image to be processed according to the image extraction features of the N text images to be used.
An apparatus comprising a processor and memory:

The memory is used to store computer programs;

The processor is configured to execute the method according to any one of claims 1-9 according to the computer program.
A computer-readable storage medium, the computer-readable storage medium is used to store a computer program, and the computer program is used to execute the method according to any one of claims 1-9.
A computer program product, when running on a terminal device, the computer program product causes the terminal device to execute the method according to any one of claims 1-9.