CN115238042A - Text coverage rate calculation method, device, equipment and storage medium - Google Patents

Text coverage rate calculation method, device, equipment and storage medium Download PDF

Info

Publication number
CN115238042A
CN115238042A CN202210927789.XA CN202210927789A CN115238042A CN 115238042 A CN115238042 A CN 115238042A CN 202210927789 A CN202210927789 A CN 202210927789A CN 115238042 A CN115238042 A CN 115238042A
Authority
CN
China
Prior art keywords
text
full
entry
library
modules
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210927789.XA
Other languages
Chinese (zh)
Inventor
梁晓云
高永强
严培
李斯敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Priority to CN202210927789.XA priority Critical patent/CN115238042A/en
Publication of CN115238042A publication Critical patent/CN115238042A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The present disclosure provides a method, apparatus, device, storage medium, and program product for determining text coverage, the method comprising the steps of: acquiring a page screenshot of one or more modules in an application program in the running process through a traversal tool; performing text recognition on the page screenshot to obtain text content in the page screenshot; matching the text content with a full-scale text library corresponding to one or more modules in the application program to obtain the number of successfully matched entries between the text content and the full-scale text library; and determining the text coverage rate of one or more modules in the application program based on the number of the successfully matched terms and the number of the terms in the full text library. Through the text coverage rate of the one or more modules, the traversal capability of the traversal tool can be effectively and quantitatively evaluated, and the traversal depth and the traversal breadth of the traversal tool in the one or more modules are reflected.

Description

Text coverage rate calculation method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for calculating text coverage.
Background
Generally speaking, overseas released games usually involve non-Chinese texts, and in order to detect whether the non-Chinese texts displayed on the game page are normal, an automatic traversal tool is adopted to extract the text contents displayed in the real running process of the game. Thus, the completeness of the traversal capability of the automated traversal tool directly determines the completeness of the text content extracted by the automated traversal tool.
Currently, there are no suitable parameters to quantitatively evaluate the traversal capabilities of automated traversal tools.
Disclosure of Invention
According to a first aspect of the present disclosure, a method for calculating a text coverage is provided, including:
acquiring a page screenshot of one or more modules in an application program in the running process through a traversal tool;
performing text recognition on the page screenshot to obtain text content in the page screenshot;
matching the text content with a full-scale text library corresponding to one or more modules in the application program to obtain the number of successfully matched entries between the text content and the full-scale text library;
and determining the text coverage rate of one or more modules in the application program based on the number of the successfully matched terms and the number of the terms in the full-text library.
According to a second aspect of the present disclosure, there is provided a device for determining a text coverage, including:
the acquisition module is configured to acquire a page screenshot of one or more modules in the application program in the running process through a traversal tool;
the text recognition module is configured to perform text recognition on the page screenshot to obtain text content in the page screenshot;
the matching module is configured to match the text content with a full-size text library corresponding to one or more modules in the application program to obtain the number of terms successfully matched between the text content and the full-size text library;
a determining module configured to determine a text coverage of one or more modules in the application based on the number of terms successfully matched and the number of terms in the full-text corpus.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory storing a program, wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method according to the first aspect of the present disclosure.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the first aspect of the present disclosure.
According to a fifth aspect of the present disclosure, a computer program product is provided, comprising a computer program, wherein the computer program, when executed by a processor, implements the method according to the first aspect of the present disclosure.
According to one or more technical schemes provided by the embodiment of the disclosure, the calculation of the text coverage of one or more modules in an application program can be realized, and the traversal capability of the traversal tool can be effectively and quantitatively evaluated through the text coverage of the one or more modules, so that the traversal depth and the traversal extent of the traversal tool in the one or more modules are reflected.
Drawings
Further details, features and advantages of the disclosure are disclosed in the following description of exemplary embodiments, taken in conjunction with the accompanying drawings, in which:
fig. 1 is a first flowchart of a method for determining text coverage according to an exemplary embodiment of the present disclosure;
FIG. 2 is a schematic illustration of text recognition provided for an exemplary embodiment of the present disclosure;
fig. 3 is a flowchart ii of a text coverage determination method provided in an exemplary embodiment of the present disclosure;
fig. 4 is a flowchart three of a text coverage determination method provided in an exemplary embodiment of the present disclosure;
fig. 5 is a schematic block diagram of a text coverage rate determining apparatus provided in an exemplary embodiment of the present disclosure;
FIG. 6 is a schematic block diagram of a chip provided by an exemplary embodiment of the present disclosure;
fig. 7 is a schematic block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description. It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence of the functions performed by the devices, modules or units.
It is noted that references to "a" or "an" in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will appreciate that references to "one or more" are intended to be exemplary and not limiting unless the context clearly indicates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
The embodiment of the present disclosure provides a method for determining text coverage, which may be executed by a terminal device or a server, where the terminal device may be a mobile phone, a tablet computer, a desktop computer, a notebook computer, or a Personal Digital Assistant (PDA), and the server may be a server for determining text coverage, and the terminal device and the server interact with each other to implement a function of determining text coverage, and specifically may interact with the server by a software application (app) on the terminal device, and the like. The terminal device and the user may perform human-computer interaction in one or more of a keyboard, a touch screen, voice interaction, or a handwriting manner, which is not limited by the present disclosure.
The technical solutions in the embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
Fig. 1 is a first flowchart of a method for determining text coverage according to an exemplary embodiment of the present disclosure, as shown in fig. 1, including the following steps:
s101, acquiring a page screenshot of one or more modules in the application program in the running process through a traversal tool.
The disclosed embodiments are not limited to a specific type of application, and may be a game-type program or a training-type program. Generally speaking, based on the content attribute of the scene in the application program, the application program can be divided into different modules, and taking the game-like program as an example, the game can be roughly divided into a task module, an article module, a mall module, an application module, a skill module, an activity module, and the like according to the content attribute of the scene in the game. The method and the device for acquiring the page screenshot can acquire the page screenshot for one module in the application program, can also acquire the page screenshot for a plurality of modules in the application program, and can also acquire the page screenshot for all the modules in the application program; it can be understood that when a page screenshot of a module in an application program is obtained, the finally determined text coverage rate is the text coverage rate corresponding to the module; when the obtained screen shots of the multiple modules in the application program are obtained, finally determining that the obtained text coverage rate is the text coverage rate corresponding to the multiple modules; and when the obtained screen shots of all the modules in the application program are obtained, finally determining that the obtained text coverage rate is the text coverage rate of all the modules in the application program. For example, assuming that the traversal tool obtains the screenshot of the task module in the game in step S101, the finally determined text coverage is the text coverage of the task module, and the text coverage may quantitatively evaluate the automatic traversal capability of the traversal tool under the task module. For another example, assuming that the traversal tool obtains the screenshot of the object module in the game in step S101, the finally determined text coverage is the text coverage of the object module, and the text coverage may quantitatively evaluate the automatic traversal capability of the traversal tool under the object module. The traversal tool is based on image recognition, the picture structures and the control types of different modules are different, the traversal capability of the traversal tool in different modules is different, and therefore the text coverage rates obtained in different modules are different. The text coverage rate of different modules is known, and the targeted optimization of the traversal tool is facilitated.
The specific type of the traversing tool is not limited in the embodiment of the disclosure, the traversing tool can be fully automatic, and the traversing tool based on the AI algorithm does not need manual participation in the traversing process, and can automatically find a control which can be clicked in a page and click in the traversing process. In the traversal process, the traversal tool may obtain the page screenshot in the running process of the application program, and the time interval for the traversal tool to obtain the page screenshot may be set according to the time interval for page switching in the application program, for example, the traversal tool may be set to obtain one page screenshot every 2 seconds. It should be noted that, due to the limitation of the AI algorithm, the traversal capability of the traversal tool is limited, which is not controllable for the traversal depth of each module of the application program, and is also not controllable for the traversal extent of multiple modules in the application degree, and therefore it is very important to quantitatively evaluate the traversal capability of the traversal tool through some parameters.
Generally, an application includes a plurality of modules, and in order to achieve the maximum benefit within a limited time, the traversal priority corresponding to each module, that is, the order in which the traversal tool traverses each module, may be determined first. The traversal priorities of the modules can be determined from different dimensions, and in a specific embodiment, different traversal priorities can be assigned to the modules from the perspective of the word amount contained in the modules, specifically, a higher traversal priority is assigned to a module with a larger word amount, and a lower traversal priority is assigned to a module with a smaller word amount. Therefore, on the premise of limited time, modules with more characters can be preferentially traversed, and the maximization of benefits is realized. For example, in a game-like program, the amount of text for the task module is greater than the amount of text for the item module. And when the traversal priority is determined, assigning a relatively higher traversal priority to the task module and a relatively lower traversal priority to the article module. In another specific embodiment, different traversal priorities may be assigned to the modules from the use frequency of the modules, and specifically, a module with a higher use frequency is assigned a higher traversal priority, such as a main interface and a module that can be directly reached from the main interface.
And S102, performing text recognition on the page screenshot to obtain text content in the page screenshot.
In this step, the text content included in the page screenshot obtained in step S101 is recognized. The embodiment of the present disclosure does not limit the specific way of text recognition, and in an implementable way, the text content in the screenshot of the page is recognized through a pre-trained character recognition model. Fig. 2 is a schematic diagram of text recognition provided for an exemplary embodiment of the present disclosure, and as shown in fig. 2, a text recognition model recognizes a text in a page screenshot 21 to obtain a text 1, a text 2, and a text 3.
S103, matching the text content with a full-size text library corresponding to one or more modules in the application program to obtain the number of successfully matched entries between the text content and the full-size text library.
In the embodiment of the present disclosure, the full-amount text library corresponds to a module traversed by the traversal tool, for example, if the traversal tool traverses a task module, the full-amount text library selected during matching is the full-amount text library corresponding to the task module; and if the traversing tool traverses all modules of the application program, the selected full-amount text library is the full-amount text library corresponding to all the modules during matching.
In one embodiment, if the format type of the full-size text library is Excel, the full-size text library can be converted into a CSV format for storage, and therefore reading can be facilitated. Taking a game program as an example, a standard format of a full-size text library corresponding to one or more modules in an application program is described, as shown in table 1, the full-size text library includes four columns, which are respectively Key (index), value (translation), original (Original text), info (remark), and Info (remark) for describing a scene and a module where a current entry appears. Embodiments of the present disclosure concern a Value column that includes a plurality of entries, each entry being located in a square, and each entry containing one or more statements. For example, when a virtual character (NPC) in a dialog scene speaks multiple sentences, the words are counted as one entry, and the page turning scroll occurs in an overlong sentence, but the words belong to one entry.
Table 1 is a standard format for an exemplary full-size text library
Figure BDA0003780323860000051
The embodiment of the present disclosure does not limit the matching manner of the text content and the full-size text library corresponding to one or more modules in the application program, and in an implementable manner, as shown in fig. 3, the method for matching the text content and the full-size text library corresponding to one or more modules in the application program includes the following steps:
s301, for each entry in the text content, obtaining the text similarity between the entry and each entry in the full-volume text library.
The embodiment of the present disclosure does not limit the specific representation manner of the text similarity, for example, the text similarity may be represented by an editing distance, and correspondingly, for each entry in the text content, the text similarity between the entry and each entry in the full-volume text library is obtained, that is, the editing distance between the entry and each entry in the full-volume text library is obtained.
Alternatively, in one embodiment, text similarity is characterized by a hamming distance. For each entry in the text content, obtaining the text similarity between each entry and each entry in the full-volume text library, as shown in fig. 4, includes the following steps:
s401, obtaining Simhash values of the Simhash corresponding to the entries in the full-scale text library.
In this step, each vocabulary entry in the full-text library is mapped into a preset number of bits, for example, a 64-bit binary string, based on the Simhash algorithm, which belongs to the locality sensitive hash algorithm, and the binary string mapped based on the Simhash algorithm can represent the content of the original vocabulary entry to a certain extent.
In the following, taking the example that the bit number of the sigma hash Simhash value is 6 bits, the implementation process of the sigma hash Simhash algorithm is described, and the sigma hash Simhash algorithm in the embodiment of the present disclosure includes the following five steps:
firstly, performing word segmentation on a target entry according to the characteristics of a full-scale text library, removing noise words, and adding corresponding weight to each word, wherein the weight represents the importance degree of the word in the whole sentence. For example, assume that the target entry is "please select a piece of shadow equipment and continue the awakening task", the target entry is segmented into words to "please (1) select (3) a piece of shadow (3) equipment (4) and (1) continue (2) the awakening (4) task (3)", wherein the number in the parentheses represents the weight of the corresponding word, for example, the weight of "shadow" is 5, the weight of "awakening" is 4, the weight of "selecting" is 3, and the like.
And secondly, mapping each word after word segmentation into a hash character string based on a hash function, wherein for example, "selection" is mapped to be "101001", and "shadow" is mapped to be "100011".
And thirdly, multiplying each hash character string by the corresponding weight to obtain a weighted hash character string based on each hash character string obtained in the second step, wherein the weighted hash character string is positively multiplied by the weight when the weighted hash character string is multiplied by the weight, and is negatively multiplied by the weight when the weighted hash character string is multiplied by the weight, and the weighted hash character string is multiplied by the weight when the weighted hash character string is multiplied by the weight. For example, if the hash string corresponding to "select" is "101001" and the weight thereof is 3, the weighted hash string corresponding to "select" is "3-33-3-33"; for another example, if the hash string corresponding to the "shadow" is "100011" and the weight thereof is 5, the weighted hash string corresponding to the "shadow" is "5-5-5-555".
And fourthly, accumulating the weighted hash character strings of all the words obtained in the third step in a bit-by-bit manner, for example, selecting the weighted hash character string corresponding to the word as 3-33-3-33, and adding the weighted hash character strings corresponding to the shadow as 5-5-5-555 to obtain 8-8-2-828. Here, only two words are accumulated as an example, and in practical applications, weighted hash strings of respective words included in the target entry should be accumulated.
And fifthly, reducing the dimensionality of the hash character string accumulated in the fourth step into a binary character string to form a Simhash value corresponding to the target entry. In the dimension reduction calculation, more than 0 is recorded as 1, and less than 0 is recorded as 0. For example, dimension reduction calculation is performed on "8-8-2-828" to obtain a binary string of "100011".
According to the steps, a Simhash Value corresponding to each vocabulary entry of a column where Value (translation) is located in the full-scale text library is obtained.
S402, for each entry in the text content, obtaining a Simhash value corresponding to each entry.
And calculating a Simhash value of the Simhash corresponding to each entry in the text content according to the Simhash algorithm of the Simhash disclosed in the step S401.
And S403, determining the Hamming distance between each entry and each entry in the full-volume text library based on the Simhash value corresponding to each entry and the Simhash value corresponding to each entry in the full-volume text library.
The hamming distance represents the number of different values on the corresponding character bits of the two binary character strings. For example, assume that a term in the text content is "select shadow to continue the wake task", and the corresponding simm hash value is "100011"; one entry in the full-quantity text library is 'please select a piece of shadow equipment and continue the wakefulness task', and the corresponding Simhash value of the Simhash is '101011'; only the value of the 3 rd character bit in the two Simhash values is different, namely the Hamming distance of the two entries is 1.
S302, determining a matching result of the vocabulary entry and the full-amount text library based on the text similarity of the vocabulary entry and each vocabulary entry in the full-amount text library.
Based on the text similarity obtained in step S301, a matching result of each entry with the full-size text corpus is determined. The embodiment of the present disclosure does not limit a specific way of determining a matching result of each entry and the full-scale text library based on the text similarity, and in an implementable way, the determining the matching result of each entry and the full-scale text library based on the text similarity of each entry and each entry in the full-scale text library includes:
determining a target Hamming distance corresponding to each entry based on the Hamming distance between each entry and each entry in the full-scale text library, wherein the target Hamming distance is the Hamming distance with the minimum value in the Hamming distances between each entry and each entry in the full-scale text library;
responding to the condition that the target Hamming distance corresponding to each entry is smaller than or equal to a preset threshold value, and determining that each entry is successfully matched with the full-scale text library;
and determining that each entry fails to be matched with the full-scale text library in response to the condition that the target Hamming distance corresponding to each entry is greater than a preset threshold value.
For example, assuming that the full-scale text library includes n entries (n is a positive integer greater than or equal to 1), n hamming distances exist between each entry in the text content and the n entries in the full-scale text library, a hamming distance with a minimum value is selected from the n hamming distances, and the minimum hamming distance is determined as a target hamming distance corresponding to each entry. Comparing the target Hamming distance with a preset threshold, and if the target Hamming distance is less than or equal to the preset threshold, determining that each entry is successfully matched with a full-scale text library; and if the target Hamming distance is greater than the preset threshold, determining that the matching of each entry and the full-scale text library fails.
The specific value of the preset threshold, which may be between 50-70% t, where t represents the number of bits of the Simhash value, is not limited by the disclosed embodiments.
The embodiment of the disclosure adopts the Simhash algorithm and the Hamming distance to match the entries, and has the following advantages: firstly, because some entries in the full-text library have some contents which cannot be displayed on an application page, such as placeholders and the like, the entries in the text contents are difficult to be completely consistent with the entries in the full-text library, and the problem that the hamming distance under the hamming hash algorithm and the preset threshold can be well solved by adopting the hamming hash Simhash algorithm is solved, because the hamming hash Simhash algorithm considers the weight of each word during word segmentation, the influence of some unimportant words on the hamming hash Simhash value is small, and meanwhile, a proper threshold is set for the hamming distance, so that a certain fault-tolerant capability is provided for entry matching. Secondly, on the page of the application program, the color is rich, the controls are various, the text display is irregular, the text content identified by the text identification model may have errors and word loss, and the Simhash algorithm and the hamming distance under the preset threshold have certain robustness on the misinformation generated by the text identification model.
And S303, obtaining the number of successfully matched entries between the text content and the full-volume text library according to the matching result of each entry in the text content and the full-volume text library.
In this step, the matching result of each entry in the text content and the full-size text base obtained in step S302 is counted, so that the number of entries successfully matched between the text content and the full-size text base can be obtained.
And S104, determining the text coverage rate of one or more modules in the application program based on the number of the successfully matched entries and the number of the entries in the full-text library.
Specifically, the text coverage of one or more modules in the application is a ratio of the number of successfully matched terms to the number of terms in the full-text library. Following the previous example, assuming that the number of terms in the full-scale text corpus is n (n is a positive integer greater than or equal to 1), and the number of terms successfully matched is k (k is a positive integer greater than or equal to 1), the text coverage of one or more modules in the application is a = k/n. Through the text coverage rate of the one or more modules, the traversal capability of the traversal tool can be effectively and quantitatively evaluated, and the traversal depth and the traversal breadth of the traversal tool in the one or more modules are reflected.
Generally speaking, the full-text library contains some content which cannot be displayed on the page of the application program, and in order to improve the calculation accuracy of the text coverage rate, the full-text library can be preprocessed, and the preprocessing comprises at least one of the following operations:
first, the entry in the full-text library whose version identification is different from that of the application program is removed. Some entries in the old version may exist in the full-volume text base, the entries cannot be displayed on the page of the application program, and in order to guarantee the calculation accuracy of the text coverage rate, the entries in the old version are removed from the full-volume text base.
Second, hypertext markup language in the full-text library is removed. For example, < color > < \ color > indicating the color of the font or < size > </size > indicating the size of the font in the vocabulary entry, which should be removed from the full-size text library, will not be displayed in the page of the application.
Third, operators in string format in the full-text library are removed. For example, the operator "%" in "Gather some Amaranth Herbs%/% d" is removed because these operators are not exposed in the page of the application.
Fourth, the entries in the full-text corpus that contain only numbers are removed. For example, 100,500,1000, etc. contain only numerical terms.
Fifthly, removing special characters in the full-text library belonging to an ASCII encoding table. For example, "-", "%", "#", "? ","! "etc.
And sixthly, removing the vocabulary entry for characterizing the store numerical type in the full text library. These are dynamic in the application and have no practical meaning, nor is there any reference meaning after a successful match. For example, the price of the good, etc.;
and seventhly, removing the entries which need to meet the preset conditions and can be triggered to be displayed in the full text library. For example, an entry such as error code.
Additionally, in some embodiments, prior to matching the textual content to the full-size textual library of the application, the textual content may also be pre-processed, the pre-processing including at least one of: and forming entries of adjacent lines of texts in the text content based on a preset rule, or removing the entries only containing numbers in the text content.
The foregoing mainly introduces the solutions provided by the embodiments of the present disclosure from the perspective of the server. It is understood that the server includes hardware structures and/or software modules for performing the functions in order to realize the functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The embodiment of the present disclosure may perform division of functional units on the server according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, the division of the modules in the embodiments of the present disclosure is illustrative, and is only one division of logic functions, and there may be another division in actual implementation.
In the case of dividing each functional module by corresponding each function, the exemplary embodiments of the present disclosure provide a text coverage determination apparatus, which may be a server or a chip applied to the server. Fig. 5 shows a functional block schematic diagram of a text coverage determination apparatus according to an exemplary embodiment of the present disclosure. As shown in fig. 5, the apparatus 500 for determining text coverage rate includes:
the acquiring module 501 is configured to acquire a page screenshot of one or more modules in an application program in an operating process through a traversal tool;
a text recognition module 502 configured to perform text recognition on the page screenshot to obtain text content in the page screenshot;
the matching module 503 is configured to match the text content with a full-size text library corresponding to one or more modules in the application program, so as to obtain the number of terms successfully matched between the text content and the full-size text library;
the determining module 504 is configured to determine to obtain the text coverage of one or more modules in the application based on the number of successfully matched terms and the number of terms in the full-text corpus.
In one possible approach, the matching module 503 is configured to:
for each entry in the text content, acquiring the text similarity between each entry and each entry in a full text library;
determining a matching result of each entry and the full-volume text library based on the text similarity of each entry and each entry in the full-volume text library;
and obtaining the number of successfully matched entries between the text content and the full-volume text library according to the matching result of each entry in the text content and the full-volume text library.
In one possible approach, the text similarity is characterized by a hamming distance; the matching module 503 is further configured to:
acquiring Simhash values of the Simhash corresponding to the entries in the full-volume text library;
for each entry in the text content, acquiring a Simhash value corresponding to each entry;
and determining the Hamming distance between each entry and each entry in the full-volume text library based on the Simhash value corresponding to each entry and the Simhash value corresponding to each entry in the full-volume text library.
In one possible approach, the matching module 503 is further configured to:
determining a target Hamming distance corresponding to each vocabulary entry based on the Hamming distance between each vocabulary entry and each vocabulary entry in the full-scale text library, wherein the target Hamming distance is the Hamming distance with the minimum value in the Hamming distances between each vocabulary entry and each vocabulary entry in the full-scale text library;
responding to the condition that the target Hamming distance corresponding to each entry is smaller than or equal to a preset threshold value, and determining that each entry is successfully matched with the full-scale text library;
and determining that each entry fails to be matched with the full-scale text library in response to the condition that the target Hamming distance corresponding to each entry is greater than a preset threshold value.
In one possible implementation, the apparatus further includes a first preprocessing module configured to:
pre-processing the full-scale text library, wherein the pre-processing comprises at least one of the following operations: removing entries of which the version identifications are different from the version identifications of the application programs in the full-scale text library, removing hypertext markup language in the full-scale text library, removing operational characters in a character string format in the full-scale text library, removing entries only containing numbers in the full-scale text library, removing special characters in an ASCII (American standard code for information interchange) coding table in the full-scale text library, removing entries representing the numerical types of stores in the full-scale text library, or removing entries which can be triggered to be displayed only when preset conditions are met in the full-scale text library.
In one possible implementation, the apparatus further includes a second preprocessing module configured to:
preprocessing the text content, wherein the preprocessing comprises at least one of the following operations: and combining adjacent lines of texts in the text content based on a preset rule, and removing the entry which only contains numbers in the text content.
In one possible implementation, the obtaining module 501 is configured to:
determining a traversal priority corresponding to each module in the one or more modules, wherein the traversal priority is used for representing the sequence of the traversal tool for traversing the one or more modules;
and based on the traversal priority of one or more modules, sequentially acquiring page screenshots in the running process of one or more modules through a traversal tool.
Fig. 6 shows a schematic block diagram of a chip according to an exemplary embodiment of the present disclosure. As shown in fig. 6, the chip 600 includes one or more than two (including two) processors 601 and a communication interface 602. The communication interface 602 may support the server to perform the data transceiving steps in the image processing method, and the processor 601 may support the server to perform the data processing steps in the image processing method.
Optionally, as shown in fig. 6, the chip 600 further includes a memory 603, and the memory 603 may include a read-only memory and a random access memory and provide operating instructions and data to the processor. The portion of memory may also include non-volatile random access memory (NVRAM).
In some embodiments, as shown in fig. 6, the processor 601 performs the corresponding operation by calling an operation instruction stored in the memory (the operation instruction may be stored in the operating system). The processor 601 controls the processing operations of any of the terminal devices, and may also be referred to as a Central Processing Unit (CPU). The memory 603 may include both read-only memory and random access memory, and provides instructions and data to the processor 2201. A portion of the memory 603 may also include NVRAM. For example, in applications where the memory, communication interface, and memory are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 604 in fig. 6.
The method disclosed by the embodiment of the present disclosure can be applied to a processor, or can be implemented by the processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an ASIC, an FPGA (field-programmable gate array) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present disclosure may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present disclosure may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, etc. as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
An exemplary embodiment of the present disclosure also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor, the computer program, when executed by the at least one processor, is for causing the electronic device to perform a method according to an embodiment of the disclosure.
The disclosed exemplary embodiments also provide a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is adapted to cause the computer to perform a method according to an embodiment of the present disclosure.
The exemplary embodiments of the present disclosure also provide a computer program product comprising a computer program, wherein the computer program, when executed by a processor of a computer, is adapted to cause the computer to perform a method according to an embodiment of the present disclosure.
Referring to fig. 7, a block diagram of a structure of an electronic device 700, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the electronic device 700 includes a computing unit 701, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 707 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
A number of components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706, an output unit 707, a storage unit 707, and a communication unit 709. The input unit 706 may be any type of device capable of inputting information to the electronic device 700, and the input unit 706 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. Output unit 707 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 704 may include, but is not limited to, a magnetic disk, an optical disk. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers, and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
Computing unit 701 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 performs the respective methods and processes described above. For example, in some embodiments, the method of calculating text coverage described above may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 707. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. In some embodiments, the calculation unit 701 may be configured in any other suitable way (e.g. by means of firmware) to perform the calculation method of text coverage.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the procedures or functions described in the embodiments of the present disclosure are performed in whole or in part. The computer may be a general purpose computer, special purpose computer, computer network, terminal, user equipment, or other programmable device. The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire or wirelessly. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape; or an optical medium, such as a Digital Video Disc (DVD); it may also be a semiconductor medium, such as a Solid State Drive (SSD).
While the disclosure has been described in conjunction with specific features and embodiments thereof, it will be evident that various modifications and combinations can be made thereto without departing from the spirit and scope of the disclosure. Accordingly, the specification and figures are merely exemplary of the present disclosure as defined in the appended claims and are intended to cover any and all modifications, variations, combinations, or equivalents within the scope of the present disclosure. It will be apparent to those skilled in the art that various changes and modifications can be made in the present disclosure without departing from the spirit and scope of the disclosure. Thus, if such modifications and variations of the present disclosure fall within the scope of the claims of the present disclosure and their equivalents, the present disclosure is intended to include such modifications and variations as well.

Claims (10)

1. A method for determining text coverage, comprising:
acquiring a page screenshot of one or more modules in an application program in the running process through a traversal tool;
performing text recognition on the page screenshot to obtain text content in the page screenshot;
matching the text content with a full-scale text library corresponding to one or more modules in the application program to obtain the number of successfully matched entries between the text content and the full-scale text library;
and determining the text coverage rate of one or more modules in the application program based on the number of the successfully matched entries and the number of the entries in the full-text library.
2. The method of claim 1, wherein matching the textual content to a full-size textual library corresponding to one or more modules in the application comprises:
for each entry in the text content, acquiring the text similarity between the entry and each entry in the full-volume text library; determining a matching result of the vocabulary entry and the full-volume text library based on the text similarity of the vocabulary entry and each vocabulary entry in the full-volume text library;
and obtaining the number of successfully matched entries between the text content and the full-volume text library according to the matching result of each entry in the text content and the full-volume text library.
3. The method of claim 2, wherein the text similarity is characterized by a hamming distance; for each entry in the text content, obtaining the text similarity between the entry and each entry in the full-volume text library includes:
acquiring Simhash values respectively corresponding to all entries in the full-text library;
for each entry in the text content, acquiring a Simhash value corresponding to each entry;
and determining the Hamming distance between each entry and each entry in the full-scale text library based on the Simhash value corresponding to each entry and the Simhash value corresponding to each entry in the full-scale text library.
4. The method of claim 3, wherein determining the matching result for each entry to the full-scale text corpus based on the text similarity of each entry to entries in the full-scale text corpus comprises:
determining a target hamming distance corresponding to each vocabulary entry based on the hamming distance between each vocabulary entry and each vocabulary entry in the full-scale text library, wherein the target hamming distance is the hamming distance with the minimum value in the hamming distances between each vocabulary entry and each vocabulary entry in the full-scale text library;
responding to the condition that the target Hamming distance corresponding to each entry is smaller than or equal to a preset threshold value, and determining that each entry is successfully matched with the full-amount text library;
and determining that the matching between each entry and the full-scale text library fails in response to the condition that the target Hamming distance corresponding to each entry is greater than the preset threshold.
5. The method of claim 1, wherein prior to matching the textual content to the full-size text-library of the application, the method further comprises:
pre-processing the full-size text library, wherein the pre-processing comprises at least one of the following operations: removing entries of which the version identifications are different from the version identifications of the application programs in the full-volume text library, removing hypertext markup language in the full-volume text library, removing operational characters in a character string format in the full-volume text library, removing entries of which only contain numbers in the full-volume text library, removing special characters in an ASCII (American standard code for information interchange) coding table in the full-volume text library, removing entries of which the types of store values are represented in the full-volume text library, or removing entries of which the display can be triggered only when preset conditions are met in the full-volume text library.
6. The method of claim 1, wherein prior to matching the textual content to the full-size text-library of the application, the method further comprises: preprocessing the text content, wherein the preprocessing comprises at least one of the following operations: and combining adjacent lines of texts in the text content based on a preset rule, and removing entries only containing numbers in the text content.
7. The method of claim 1, wherein prior to matching the textual content to a full-size textual library corresponding to one or more modules in the application, the method further comprises:
and carrying out format conversion on the full-amount text library to obtain the full-amount text library in the CSV format.
8. An apparatus for determining text coverage, comprising:
the acquisition module is configured to acquire page screenshots of one or more modules in the application program in the running process through a traversing tool;
the text recognition module is configured to perform text recognition on the page screenshot to obtain text content in the page screenshot;
the matching module is configured to match the text content with a full-size text library corresponding to one or more modules in the application program to obtain the number of terms successfully matched between the text content and the full-size text library;
and the determining module is configured to determine to obtain the text coverage rate of one or more modules in the application program based on the number of the successfully matched terms and the number of the terms in the full-text library.
9. An electronic device, comprising:
a processor; and
a memory for storing the program, wherein the program is stored in the memory,
wherein the program comprises instructions which, when executed by the processor, cause the processor to carry out the method according to any one of claims 1-7.
10. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
CN202210927789.XA 2022-08-03 2022-08-03 Text coverage rate calculation method, device, equipment and storage medium Pending CN115238042A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210927789.XA CN115238042A (en) 2022-08-03 2022-08-03 Text coverage rate calculation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210927789.XA CN115238042A (en) 2022-08-03 2022-08-03 Text coverage rate calculation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115238042A true CN115238042A (en) 2022-10-25

Family

ID=83677251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210927789.XA Pending CN115238042A (en) 2022-08-03 2022-08-03 Text coverage rate calculation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115238042A (en)

Similar Documents

Publication Publication Date Title
US10095610B2 (en) Testing applications with a defined input format
CN110069608B (en) Voice interaction method, device, equipment and computer storage medium
WO2020108063A1 (en) Feature word determining method, apparatus, and server
CN104142909A (en) Method and device for phonetic annotation of Chinese characters
CN104718545A (en) Incremental multi-word recognition
CN111832449A (en) Engineering drawing display method and related device
CN105283882B (en) Apparatus for text input and associated method
JP2022120024A (en) Audio signal processing method, model training method, and their device, electronic apparatus, storage medium, and computer program
CN113673432A (en) Handwriting recognition method, touch display device, computer device and storage medium
CN114639096A (en) Text recognition method and device, electronic equipment and storage medium
WO2020052060A1 (en) Method and apparatus for generating correction statement
CN105320641B (en) Text verification method and user terminal
CN113392660A (en) Page translation method and device, electronic equipment and storage medium
US20230096921A1 (en) Image recognition method and apparatus, electronic device and readable storage medium
CN115238042A (en) Text coverage rate calculation method, device, equipment and storage medium
CN114781359A (en) Text error correction method and device, computer equipment and storage medium
CN114218393A (en) Data classification method, device, equipment and storage medium
CN113886748A (en) Method, device and equipment for generating editing information and outputting information of webpage content
CN109036379B (en) Speech recognition method, apparatus and storage medium
US10691888B2 (en) Method, terminal, apparatus and computer-readable storage medium for extracting a headword
CN112784046A (en) Text clustering method, device and equipment and storage medium
CN114241471B (en) Video text recognition method and device, electronic equipment and readable storage medium
CN109815325B (en) Answer extraction method, device, server and storage medium
CN111190818B (en) Front-end code analysis method, front-end code analysis device, computer equipment and storage medium
CN114942979A (en) Short text problem matching method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination