WO2017072865A1

WO2017072865A1 - Testing device, testing method, recording medium, and program

Info

Publication number: WO2017072865A1
Application number: PCT/JP2015/080288
Authority: WO
Inventors: 桂太平中; 彩美木下
Original assignee: 楽天株式会社
Priority date: 2015-10-27
Filing date: 2015-10-27
Publication date: 2017-05-04
Also published as: JPWO2017072865A1; JP6356924B2

Abstract

An acquisition unit (24) acquires an image depicted by a document written in HTML by providing the document to a testing browser and causing the document to be displayed on a screen. A recognition unit (26) acquires a recognized text by performing character recognition on the acquired image. An extraction unit (27) extracts the actual text that should be depicted by eliminating tags from the document. A determination unit (28) determines whether the document is correctly displayed in the testing browser by comparing the recognized text and the actual text.

Description

Inspection device, inspection method, recording medium, and program

The present invention relates to an inspection apparatus, an inspection method, a recording medium, and a program for appropriately inspecting whether a document including text is correctly drawn by a browser.

When a document described in a predetermined markup language such as HTML (Hypertext Markup Language) (hereinafter simply referred to as a document) is given to a web browser (hereinafter simply referred to as a browser), the browser tags the document. To display text and images on the screen. However, because the interpretation of tags differs between different browsers, the layout may be corrupted by other browsers even though text and images are displayed neatly in some browsers. Similar problems may occur between different versions of the same browser or between different terminals. Therefore, various methods for inspecting whether the layout of texts and images is not broken have been proposed.

For example, in Patent Document 1, a web page described in HTML is given to two web browsers with different specifications, the visuals of rendered image pairs are compared, and the position, size, style, etc. of the elements in each image are compared. If there is a difference, a method for detecting it as an error is disclosed.

JP2013-77301A

However, although the inspection method disclosed in Patent Document 1 and the like can detect a layout collapse due to a part of the tabs being hidden or an image protruding from the display area of the screen, the characters are displayed overlapping each other. If a character is missing, the defect cannot be detected. For this reason, it has not been possible to properly check whether or not a document including text is correctly rendered by the browser.

The present invention is to solve the above problems, and provides an inspection apparatus, an inspection method, a recording medium, and a program for appropriately inspecting whether a document including text is correctly rendered by a browser. For the purpose.

In order to achieve the above object, an inspection apparatus according to the present invention comprises:
By giving the document to the browser and displaying it on the screen or drawing it in the virtual view of the browser, an image in which the document is drawn is acquired,
By recognizing the acquired image to obtain recognized text,
Extract body text to be drawn from the document,
By comparing the recognized text and the body text, it is determined whether or not the document is correctly rendered by the browser.

According to the present invention, it is possible to appropriately check whether or not a document including text is correctly rendered by the browser.

It is a figure which shows the hardware constitutions of the test | inspection apparatus which concerns on embodiment of this invention. It is a figure which shows the function structure of the inspection apparatus which concerns on embodiment of this invention. It is a figure which shows an example of the document which the document memory | storage part concerning embodiment of this invention memorize | stores. It is a figure which shows the image by which the document was drawn by the inspection browser. (A) The recognized text which concerns on embodiment of this invention, (b) It is a figure which shows the text body based on embodiment of this invention. It is a figure which shows the screen which displays that the document is not drawn correctly by the inspection browser. It is a figure which shows the screen which displays the warning that a document is not drawn correctly. It is a flowchart showing the flow of the process performed by the test | inspection apparatus which concerns on embodiment of this invention. It is a figure which shows the 2nd image by which the document was drawn by the 2nd browser. It is a flowchart showing the flow of the process performed by the test | inspection apparatus which concerns on the modification of this invention.

Embodiments of the present invention will be described below. In addition, this embodiment is for description and does not limit the scope of the present invention. Accordingly, those skilled in the art can employ embodiments in which each or all of these elements are replaced with equivalent ones, and these embodiments are also included in the scope of the present invention. Further, in describing an embodiment of the present invention with reference to the drawings, the same or corresponding parts in the drawings are denoted by the same reference numerals.

FIG. 1 shows a hardware configuration of an inspection apparatus 1 according to an embodiment of the present invention. Hereinafter, a description will be given with reference to FIG. As shown in FIG. 1, the inspection apparatus 1 includes a storage device 11, a reception device 12, a display device 13, and a control unit 14.

The storage device 11 includes a recording medium such as a hard disk and stores various software such as various browsers and inspection software for inspecting whether or not to draw a document. The storage device 11 stores a plurality of documents described in HTML. The document is not limited to the one described in HTML, and may be described in other markup languages such as XHTML (Extensible Hypertext Markup Language) and XML (Extensible Markup Language). Further, the storage device 11 may be configured by a non-volatile non-temporary recording medium other than a hard disk, for example, a flash memory, an optical disk, or a magneto-optical disk.

The reception device 12 includes a keyboard, a mouse, a track pad, and the like. The accepting device 12 accepts a user instruction through a user operation, and generates an input signal indicating the accepted user instruction. Then, the generated input signal is supplied to the control unit 14.

The display device 13 includes a screen such as a liquid crystal display and displays various data such as text data, images, and moving images supplied from the control unit 14.

The control unit 14 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like, and a program stored in a non-temporary recording medium such as a ROM is a temporary recording medium. The data is read into a certain RAM, and a command included in the read program is executed by the CPU. Moreover, the control part 14 receives the input signal supplied from the reception apparatus 12, and performs the user's instruction | indication which the received input signal shows by CPU.

FIG. 2 shows a functional configuration of the inspection apparatus 1. The control unit 14 reads the program stored in the ROM or the like into the RAM and controls the execution of the program, whereby the inspection device 1 is displayed in the software storage unit 21, the document storage unit 22, the reception unit 23, the acquisition unit 24, and the display. Function as a unit 25, a recognition unit 26, an extraction unit 27, and a determination unit 28.

The software storage unit 21 and the document storage unit 22 are constructed in the storage device 11. The receiving unit 23 is realized by the cooperation of the receiving device 12 and the control unit 14. The acquisition unit 24, the recognition unit 26, the extraction unit 27, and the determination unit 28 are realized by the control unit 14. The display unit 25 is realized by the cooperation of the display device 13 and the control unit 14.

The software storage unit 21 stores various software such as various browsers and inspection software for inspecting whether or not a document is drawn. Various browsers include Internet Explorer [Internet Explorer (registered trademark)], Mozilla Firefox (Mozilla Firefox (registered trademark)), Google Chrome [Google Chrome (registered trademark)], Opera [Opera (registered trademark)] Includes browser. However, the various browsers may include various browsers other than the famous browsers described above.

The document storage unit 22 stores a plurality of documents described in HTML. Note that, as described above, the plurality of documents may be described in a markup language other than HTML.

The accepting unit 23 accepts various requests and instructions from the user through the operation of the accepting device 12 by the user. For example, a request for inspecting whether or not a document is correctly drawn (hereinafter referred to as an inspection request) is received. Also, an instruction indicating a document selected by the user as an inspection target is received. Also, an instruction indicating the browser selected by the user as the inspection browser is received. Then, the received request and instruction are supplied to the acquisition unit 24. The document is selected from a plurality of documents stored in the document storage unit 22. The browser is selected from a plurality of browsers stored in the software storage unit 21.

When the acquisition unit 24 receives the inspection request from the reception unit 23, the acquisition unit 24 acquires the inspection software from the software storage unit 21. Then, the inspection software is activated and initialization processing is performed. The initialization process includes, for example, a process of canceling the designation when a specific browser is designated as an inspection browser, and a process of canceling the designation when a specific document is designated as an inspection target.

Further, when receiving the instruction indicating the document selected by the user as the inspection target from the receiving unit 23, the acquiring unit 24 acquires the selected document from the document storage unit 22 as the inspection target based on the instruction.

Further, when receiving the instruction indicating the browser selected by the user as the inspection browser from the reception unit 23, the acquisition unit 24 acquires the selected browser from the software storage unit 21 as the inspection browser, and starts the inspection browser. .

Also, the acquisition unit 24 acquires an image in which the document to be inspected is drawn by giving the document acquired as the inspection target to the inspection browser and displaying the document on the screen of the display device 13.

FIG. 3 shows an example of a document stored in the document storage unit 22. As shown in the figure, the document stored in the document storage unit 22 is described in HTML. The acquisition unit 24 gives the document shown in the figure to the inspection browser and displays it on the screen of the display device 13, thereby acquiring an image on which the document shown in the figure is drawn.

FIG. 4 shows an image 110 in which the document shown in FIG. 3 is drawn by the inspection browser. In the image 110, the text included in the document shown in FIG. 3 is drawn at the position specified by the tag. In this specification, it should be noted that an image on which a document is drawn is not an image on which a source code written in HTML is drawn.

There are also two things to note about the image 110 shown in FIG. One is that the character string “health food / supplement” is to be displayed in the left frame 110a of the image 110, and a part of the character string protrudes from the frame 110a. Is hidden in the banner ad. The other is a part of the character string “best for bulk buying ... very active!” In the frame 110b on the lower right side of the image 110, and “it can also be used at work, school, shops, leisure, etc. It overlaps with a part of the character string.

The display unit 25 displays the document given to the inspection browser on the screen of the display device 13 based on the control of the acquisition unit 24. Further, based on the control of the determination unit 28, a determination result indicating whether or not the document has been correctly rendered by the inspection browser is displayed on the screen of the display device 13.

The recognition unit 26 acquires the text drawn on the image as recognized text by recognizing the image acquired by the acquisition unit 24. Note that the recognition unit 26 divides the image acquired by the acquisition unit 24 into an image representing each character using, for example, morphological analysis, and uses the feature amount (feature amount indicating the feature amount) from the divided image. Vector). Then, character recognition is performed by comparing the extracted feature amount with the character feature amount stored in advance in the storage device 11. A feature amount used for character recognition includes a directional line element feature using the outline of a character. Further, the recognition unit 26 acquires position information indicating from which position in the image each character obtained as a result of character recognition is recognized. This position information is used to display a warning indicating that there is a defect in character drawing at or near the position where there is a defect in the screen of the display device 13 when a defect is found in the character drawing. .

The extraction unit 27 extracts the text included in the document as the body text by removing the tag from the document acquired by the acquisition unit 24. The body text is text to be drawn from the document and matches the recognized text acquired by character recognition when the document is correctly drawn by the inspection browser.

FIG. 5A shows recognized text acquired by character recognition of the image 110 shown in FIG. 4, and FIG. 5B shows body text extracted from the document 100 shown in FIG. Here, when attention is paid to the dashed box 120a in the recognized text, it is understood that a part of the character string “health food / supplement”, that is, the characters “n” and “g” are missing. This missing character is because part of the character string “health food / supplement” is hidden in the banner advertisement in the image 110 shown in FIG. 4, so that the recognition unit 26 can recognize the characters “n” and “g”. It is caused by not having. That is, if a character is missing in the inspection browser, the defect is reflected in the recognized text.

In addition, the symbol string “◯ × Δ ☆ # ♭ ● □ ▲ ★ *” in the dashed box 120b is erroneously recognized as a different character string by the character recognition when the overlapping character string is displayed in the image 110. It shows that. Each character displayed in an overlapping manner is recognized as another character or symbol having a similar shape because there is no matching character. Therefore, the character string displayed in an overlapping manner is converted into another character string and appears in the recognized text. In this way, the defect that characters are displayed overlapping each other is also reflected in the recognized text.

On the other hand, the body text shown in FIG. 5B is a text extracted by removing the tag from the document, and thus does not include defects such as missing characters. Therefore, the comparison between the recognized text and the body text indicates that the document is drawn correctly if there is no difference between the two texts, and the document is not drawn correctly if there is a difference between the two texts.

The determining unit 28 compares the recognized text acquired by the recognizing unit 26 with the body text extracted by the extracting unit 27, and determines whether or not the document is correctly rendered by the inspection browser. If there is a character that does not appear in the body text among the characters that appear in the recognized text, it is determined that the document is not drawn correctly by the inspection browser. Also, if there are characters that do not appear in the recognized text among the characters that appear in the body text, it is determined that the document is not drawn correctly by the inspection browser. On the other hand, if the recognized text matches the body text, it is determined that the document has been correctly rendered by the inspection browser.

Note that the determination unit 28 may compare the recognized text and the body text in units of characters or in units of words (character strings delimited by blank characters or symbols). Further, the determination unit 28 may switch between character-based comparison and word-based comparison depending on the language in which the document is described. For example, when the document is written in English, the determination unit 28 compares the recognized text and the body text in units of words, and when the document is written in Japanese, The body text may be compared in character units. For example, the determination unit 28 refers to the value of the language designation lang tag or specifies the language of a character that appears in large numbers in the body tag, thereby determining which language the document is described in. Can be identified.

Further, the determination unit 28 controls the display unit 25 to display the determination result on the screen of the display device 13.

FIG. 6 shows a screen that displays that the document is not correctly rendered by the inspection browser. If the determination unit 28 determines that the document is not drawn correctly, as shown in the figure, a character string “The document was not correctly displayed by the inspection browser” is displayed on the screen as a determination result. Is displayed.

Also, the determination unit 28 controls the display unit 25 to display a warning that the document is not drawn correctly on a portion where the drawing is not performed correctly on the screen on which the document is drawn by the inspection browser. Further, the determination unit 28 controls the display unit 25 so that the user can see at a glance where the user is not drawing correctly, and fills the portion that is not drawn correctly or the portion before and after that with black. Hereinafter, a specific description will be given with reference to FIG.

FIG. 7 shows a screen of the display device 13 that displays a warning that the document shown in FIG. 3 is not drawn correctly. In the same screen, an exclamation mark 130a is displayed around the character string “health food / supplement” partially hidden in the banner advertisement. An exclamation mark 130b is displayed in the frame 110b. The user can know the approximate position of the portion that is not drawn correctly by using the

exclamation marks

130a and 130b. In FIG. 7, the character strings “Lime” and “Hell” are displayed in reverse colors, and the backgrounds of these character strings are filled with black. This indicates that the character between the portions painted in black is missing by filling the background of the two characters before and after the characters “n” and “g” with black. Therefore, the user can easily find out which position of the character is missing by looking at the fill. Also, in the figure, the color of the overlapping character strings in the frame 110b is inverted and displayed, and the portion where the characters overlap is filled with black. Thereby, the user can easily specify the position of the portion where the characters overlap.

It should be noted that a translucent exclamation mark may be displayed on the screen so that the user can distinguish characters hidden behind the exclamation mark. Further, the fill color for making it easy to find a defective portion of the drawing may be a color other than black as long as it is a conspicuous color, for example, red, blue, and yellow. In addition, the determination unit 28 uses position information indicating from which position in the image 110 each character obtained as a result of character recognition is used, so that the position where the character is missing or the character overlaps. The position can be estimated. By this estimation, the determination unit 28 can display a warning around a character that is not correctly drawn in the screen of the display device 13 or color a portion that is not correctly displayed. Also, as shown in FIG. 7, if a character protrudes from the frame, the character and the frame border may overlap. In this case, the correct character is not recognized for the character overlapping the frame border. By using this feature, it is possible to warn that a layout problem has occurred. Therefore, in addition to missing characters and overlapping characters, a warning can be issued according to the present invention even when a part of characters protrudes from various frame lines and overlaps. Further, in the above description, it is expressed that the missing of the character is generated by filling the two characters before and after the missing character with black. However, not only the two characters before and after, but also before and after the missing character with a predetermined number of characters. May be painted in black. For example, it may be one character before and after, or three characters before and after. In addition, it is possible to express that a missing character has occurred by filling only the character immediately before the missing character with black or painting the next character with black.

FIG. 8 is a flowchart showing the flow of processing executed by the inspection apparatus 1 according to the embodiment of the present invention. Hereinafter, a description will be given with reference to FIG. This process is started by executing a program on the hardware of the inspection apparatus 1, and the inspection apparatus 1 is realized by this process.

When this processing is started, the acquisition unit 24 determines whether or not there is a request (inspection request) for inspecting whether or not the document is correctly drawn (step S1). Then, when receiving the inspection request from the reception unit 23, the acquisition unit 24 determines that there is an inspection request (step S1; YES), and acquires the inspection software from the software storage unit 21 (step S2). The acquisition unit 24 determines that there is no inspection request when the inspection request is not supplied from the reception unit 23 (step S1; NO), and determines whether there is an inspection request until the inspection request is received from the reception unit 23. repeat.

When acquiring the inspection software, the acquisition unit 24 activates the inspection software and performs an initialization process (step S3). As described above, the initialization process is a process of canceling the designation when a specific browser is designated as a browser for inspection, or the process of releasing the designation when a specific document is designated as an inspection target. including.

Next, the acquisition unit 24 determines whether there is an instruction indicating the document selected by the user as the inspection target (step S4). The acquisition unit 24 determines that there is an instruction when the instruction indicating the document selected by the user as the inspection target is received from the reception unit 23 (step S4; YES). Then, based on the received instruction, the document selected by the user is acquired from the document storage unit 22 as an inspection target (step S5). When the instruction is not supplied from the reception unit 23, the acquisition unit 24 determines that there is no instruction (step S4; NO), and repeats the determination of the presence or absence of the instruction until the instruction is received from the reception unit 23.

Next, the acquisition unit 24 determines whether there is an instruction indicating the browser selected by the user as the inspection browser (step S6). The acquisition unit 24 determines that there is an instruction when the instruction indicating the browser selected by the user as the inspection browser is received from the reception unit 23 (step S6; YES), and the selected browser is the software storage unit 21. To obtain an inspection browser (step S7). The acquisition unit 24 determines that there is no instruction when the instruction is not supplied from the reception unit 23 (step S6; NO), and repeats the determination of the presence or absence of the instruction until the instruction is received from the reception unit 23.

The acquisition unit 24 starts the inspection browser when acquiring the inspection browser. Then, the document acquired as the inspection target is given to the inspection browser (step S8) and displayed on the screen of the display device 13 (step S9), thereby acquiring the image on which the inspection target document is drawn (step S10). .

The recognition unit 26 recognizes the image acquired by the acquisition unit 24 (step S11), and acquires the recognized text (step S12). On the other hand, the extraction unit 27 extracts the body text by removing the tag from the document acquired by the acquisition unit 24 (step S13).

Then, the determination unit 28 compares the recognized text acquired by the recognition unit 26 with the body text extracted by the extraction unit 27 (step S14), and determines whether the document is correctly rendered by the inspection browser. (Step S15). When there are characters that do not appear in the body text among the characters that appear in the recognized text, the determination unit 28 determines that the document is not drawn correctly by the inspection browser (step S15; NO). If there is a character that does not appear in the recognized text among the characters that appear in the body text, it is determined that the document is not drawn correctly by the inspection browser (step S15; NO).

If the determination unit 28 determines that the document is not drawn correctly by the inspection browser, the determination unit 28 controls the display unit 25 to display the fact on the screen of the display device 13 (step S16). See FIG. 6 for a specific example. In addition, the determination unit 28 displays a warning that the document is not drawn correctly on a portion where the document is not correctly drawn on the screen by the inspection browser (step S17). See FIG. 7 for a specific example.

In step S15, when the recognized text matches the body text, the determination unit 28 determines that the document has been correctly drawn by the inspection browser (step S15; YES), and controls the display unit 25. This is displayed on the screen of the display device 13 (step S18).

Further, when the determination unit 28 executes the process of step S17 or 18, the determination unit 28 returns to step S3 and executes the initialization process again. As a result, the designation related to the inspection browser and the document to be inspected is canceled, and preparations for starting a new inspection are made.

As described above, the inspection apparatus 1 according to the embodiment of the present invention acquires an image on which a document is drawn by giving the document to an inspection browser and displaying the document on a screen. Then, by comparing the recognized text acquired by character recognition from the acquired image with the body text extracted by removing the tag from the document, it is determined whether or not the document is correctly rendered by the inspection browser. . Therefore, according to the inspection apparatus 1 according to the embodiment of the present invention, it is possible to appropriately inspect whether a document including text is correctly rendered regardless of a browser or a terminal.

Also, if the user checks whether or not the document is drawn, check omissions are likely to occur. In particular, missing characters tend to be overlooked by the user. In this regard, the inspection apparatus 1 checks whether or not the document is drawn instead of the user, so that the check omission is suppressed.

(Modification)
Although the embodiment of the present invention has been described above, the above embodiment is an example, and the scope of application of the present invention is not limited to this. That is, the embodiments of the present invention can be applied in various ways, and all the embodiments are included in the scope of the present invention.

In the above embodiment, the method of extracting the body text from the document to be inspected has been described as an example of the method of extracting the body text. However, the present invention is not limited to this, and the body text is extracted by other methods. Also good.

For example, the extraction unit 27 gives a document to be inspected to a browser other than the inspection browser (hereinafter referred to as a second browser) among the plurality of browsers stored in the software storage unit 21 to the acquisition unit 24. An image on which a document is drawn (hereinafter referred to as a second image) may be acquired by displaying on the screen. Then, the extraction unit 27 may extract the body text by causing the recognition unit 26 to perform character recognition on the acquired second image. Further, the determination unit 28 may determine whether or not the document is correctly drawn by the inspection browser by comparing the body text extracted by character recognition with the recognized text. However, the second browser is preferably a browser that guarantees that the document is correctly rendered. This is because if the recognized text and the body text do not match, it can be determined that the drawing of the inspection browser is defective. In the following, the flow of processing for determining whether or not to draw a document in the case of using a second browser in which correct document drawing is guaranteed will be described in detail with reference to FIGS.

FIG. 9 is a diagram showing a second image in which the document shown in FIG. 3 is drawn by the second browser. The acquisition unit 24 can acquire the second image on which the document shown in FIG. 3 is drawn by giving the document shown in FIG. 3 to the second browser and displaying the document on the screen. Since the second browser is guaranteed to render the document correctly, as shown in FIG. 9, it is possible for characters to be hidden in the banner advertisement or to overlap each other in the second image. Absent. As described above, since there is no drawing defect in the second image, the extraction unit 27 can extract the body text having no defect similarly to the body text shown in FIG. 5B by recognizing the character of the second image. it can. And the determination part 28 can determine whether the document was drawn correctly using this body text.

FIG. 10 is a flowchart showing the flow of processing executed by the inspection apparatus 1 when the text text extraction method using the second browser is adopted. Hereinafter, the flow of this processing in the case of extracting the body text using the second browser will be described with reference to FIG. However, the description of the processing already described with reference to FIG. 8 is omitted.

In step S12, when the recognition unit 26 acquires the recognized text, the extraction unit 27 causes the acquisition unit 24 to acquire the second browser from the software storage unit 21 (step S19), and activates the second browser. Next, the extraction unit 27 gives the document to be inspected to the second browser to the acquisition unit 24 (step S20), and displays it on the screen of the display device 13 (step S21), thereby drawing the document to be inspected. The second image is acquired (step S22).

Next, the extraction unit 27 causes the recognition unit 26 to perform character recognition on the second image acquired by the acquisition unit 24 (step S23), thereby extracting the body text (step S13). The processing after step S14 has already been described with reference to FIG.

As described above, it is desirable that the second browser is a browser that guarantees that the document is drawn correctly. However, even if there is no such guarantee, the second browser is used to check whether or not the document is drawn. It is useful to do. Because if it is known that the characters recognized by the browser for inspection and the second browser are different, it can be determined that there is a defect in the drawing of the inspection browser, but it can be found that the drawing of either or both browsers is defective. This is because the document creator realizes that it is necessary to make some modifications to the document.

In the above-described embodiment, the determination unit 28 determines that the document is not correctly drawn by the inspection browser if the recognized text and the body text are different by one character. It may be determined that the image has been correctly drawn. For example, the determination unit 28 may set a tolerance indicating the degree of allowable difference, and may determine that the document has been correctly drawn if the difference between the recognized text and the body text is less than the tolerance. For example, the tolerance may be set to allow if the difference is up to 5 characters, or may be set to allow if the difference is up to 10 characters. Also, the user may set the tolerance in advance, or the determination unit 28 may ask the user to set the tolerance before comparing the recognized text and the body text. Further, the determination unit 28 may display the determination result on the screen of the display device 13 and display the degree of fitness of the recognized text with respect to the body text. For example, when the number of characters of the body text is 100 characters and the difference between the body text and the recognized text is 5 characters, the fitness = 95% is displayed. Further, the determination unit 28 may display a detailed report on the screen of the display device 13 so that the user can grasp the content of the difference. In the detailed report, a message “There are 3 characters that are not displayed correctly on the screen among all the characters that appear in the document.” And the “50th character“ A ”and 60 characters that appear in the document. The eye letter "I" is not displayed on the screen. May be included.

In the above description, an example in which one of a plurality of browsers stored in the software storage unit 21 is used as an inspection browser has been described. However, all browsers stored in the software storage unit 21 are used as inspection browsers. You may inspect the right or wrong of drawing. In this case, the acquisition unit 24 gives the inspection target document to all browsers stored in the software storage unit 21, and acquires an image in which the inspection target document is drawn for each browser. Subsequently, the recognition unit 26 recognizes characters for each acquired image and acquires text. Then, the determination unit 28 compares all the acquired texts with each other. If the difference between any two texts exceeds the tolerance, the determination unit 28 determines that the document is not drawn correctly, and the difference between the two texts is determined. If both are below the tolerance, it is determined that the document has been correctly drawn. As a result, the document creator can check whether the document is drawn with substantially the same content regardless of the type of browser or terminal without checking each browser or terminal. Therefore, the work burden related to document adjustment is reduced.

In the above embodiment, the acquisition unit 24 acquires the image on which the document is drawn by displaying the document on the screen. In this case, since only the text displayed on the screen is subject to the determination of whether or not to draw, whether or not the entire document is drawn may not be determined. Therefore, when the document is provided to the inspection browser, a function for starting the virtual view of the inspection browser and drawing the entire document in the virtual view may be added to the inspection software. With this function, the acquisition unit 24 can acquire an image in which the entire document is drawn from the virtual view. Therefore, the determination unit 28 can determine whether or not to draw the entire document even when the entire document does not fit on the screen.

Also, when the entire document does not fit on the screen, a function for displaying the entire document on the screen by automatically scrolling the scroll bar displayed on the screen may be added to the inspection software. With this function, the acquisition unit 24 can acquire an image in which the entire document is drawn without a virtual view.

Note that a part of the functional configuration shown in FIG. 2 may be mounted on an external device capable of communicating with the inspection device 1 via a network. For example, the inspection device 1 may acquire various software and documents from an external device that functions as the software storage unit 21 and the document storage unit 22. Further, the external apparatus functioning as the receiving unit 23 may receive various requests and instructions from the user, and the inspection apparatus 1 may cause the external apparatus to supply various requests and instructions from the user. Further, the inspection apparatus 1 may obtain an image on which a document is drawn by providing the document to an external device functioning as the display unit 25 and displaying the document on a screen of the external device. Further, the inspection device 1 may display the inspection result on the screen of the external device.

In addition, not only can the configuration for realizing the functions according to the present invention be provided in advance as the inspection apparatus 1, but also an existing personal computer or information terminal device or the like can be provided as the inspection apparatus 1 according to the present invention by applying the program. It can also function. That is, by applying the program for realizing each functional configuration by the inspection apparatus 1 exemplified in the above embodiment so that a CPU or the like for controlling an existing personal computer or information terminal device can be executed, the present invention is applied. The inspection apparatus 1 can function. In addition, the inspection method according to the present invention can be implemented using the inspection apparatus 1.

Moreover, the application method of such a program is arbitrary. For example, the program can be stored and applied to a computer-readable recording medium [CD-ROM (Compact Disc Read-Only Memory), DVD (Digital Versatile Disc), MO (Magneto Optical Disc), etc.], the Internet, etc. It is also possible to apply the program by storing it in a storage on the network and downloading it.

The present invention is capable of various embodiments and modifications without departing from the broad spirit and scope of the present invention. The above-described embodiments are for explaining the present invention and do not limit the scope of the present invention. That is, the scope of the present invention is shown not by the embodiments but by the claims. Various modifications within the scope of the claims and within the scope of the equivalent invention are considered to be within the scope of the present invention.

(Summary)
The summary of the present invention is described below.

An inspection apparatus according to an aspect of the present invention is
An acquisition unit that obtains an image in which the document is rendered by giving the document to the browser and causing the document to be displayed on the screen or rendered in a virtual view of the browser;
A recognition unit that acquires recognized text by character recognition of the acquired image;
An extractor for extracting body text to be drawn from the document;
A determination unit that determines whether or not the document is correctly rendered by the browser by comparing the recognized text and the body text;
Is provided.

The extraction unit may extract the body text by removing a tag from the document.

The determination unit may determine that the document is not correctly rendered in the browser when there is a character that does not appear in the body text among characters that appear in the recognized text.

The determination unit may determine that the document is not correctly rendered in the browser when there is a character that does not appear in the recognized text among characters that appear in the body text.

The extraction unit causes the acquisition unit to acquire the second image on which the document is drawn by giving the document to the second browser and causing the second browser to display the document or to display the document on a virtual view of the second browser. The body text may be extracted by causing the recognition unit to perform character recognition on the acquired second image.

An inspection method according to an aspect of the present invention includes:
An acquisition step in which the inspection apparatus gives the document to the browser and displays it on the screen or draws it in the virtual view of the browser, thereby obtaining an image in which the document is drawn;
A recognition step in which the inspection device acquires recognized text by character recognition of the acquired image;
An extraction step in which the inspection device extracts the body text to be drawn from the document;
A determination step for determining whether or not the document has been correctly rendered by the browser by comparing the recognized text and the body text by the inspection device;
Is provided.

A computer-readable recording medium according to an aspect of the present invention is provided.
Computer
An acquisition unit that obtains an image in which the document is rendered by giving the document to the browser and causing the document to be displayed on the screen or rendered in a virtual view of the browser;
A recognition unit that acquires recognized text by character recognition of the acquired image;
An extractor for extracting body text to be drawn from the document;
A determination unit that determines whether or not the document is correctly rendered by the browser by comparing the recognized text and the body text;
Record the program that will function as

A program according to one aspect of the present invention is:
Computer
An acquisition unit that obtains an image in which the document is rendered by giving the document to the browser and causing the document to be displayed on the screen or rendered in a virtual view of the browser;
A recognition unit that acquires recognized text by character recognition of the acquired image;
An extractor for extracting body text to be drawn from the document;
A determination unit that determines whether or not the document is correctly rendered by the browser by comparing the recognized text and the body text;
To function as.

According to the present invention, it is possible to provide an inspection apparatus, an inspection method, a recording medium, and a program for appropriately inspecting whether a document including text is correctly rendered by a browser.

DESCRIPTION OF SYMBOLS 1 Inspection apparatus 11 Storage apparatus 12 Reception apparatus 13 Display apparatus 14 Control part 21 Software storage part 22 Document storage part 23 Reception part 24 Acquisition part 25 Display part 26 Recognition part 27 Extraction part 28 Determination part 100 Document 110 Image

Claims

An acquisition unit that obtains an image in which the document is rendered by giving the document to the browser and causing the document to be displayed on the screen or rendered in a virtual view of the browser;
A recognition unit that acquires recognized text by character recognition of the acquired image;
An extractor for extracting body text to be drawn from the document;
A determination unit that determines whether or not the document is correctly rendered by the browser by comparing the recognized text and the body text;
An inspection apparatus comprising:
The inspection apparatus according to claim 1, wherein the extraction unit extracts the body text by removing a tag from the document.
The determination unit according to claim 1, wherein the determination unit determines that the document is not correctly rendered in the browser when there is a character that does not appear in the body text among characters that appear in the recognized text. Inspection device.
The determination unit according to claim 1, wherein the determination unit determines that the document is not correctly rendered in the browser when there is a character that does not appear in the recognized text among characters that appear in the body text. Inspection device.
The extraction unit causes the acquisition unit to acquire the second image on which the document is drawn by giving the document to the second browser and causing the second browser to display the document or to display the document on a virtual view of the second browser. The inspection apparatus according to claim 1, wherein the body text is extracted by causing the recognition unit to recognize characters of the acquired second image.
An acquisition step in which the inspection apparatus gives the document to the browser and displays it on the screen or draws it in the virtual view of the browser, thereby obtaining an image in which the document is drawn;
A recognition step in which the inspection device acquires recognized text by character recognition of the acquired image;
An extraction step in which the inspection device extracts the body text to be drawn from the document;
A determination step for determining whether or not the document has been correctly rendered by the browser by comparing the recognized text and the body text by the inspection device;
An inspection method comprising:
Computer
An acquisition unit that obtains an image in which the document is rendered by giving the document to the browser and causing the document to be displayed on the screen or rendered in a virtual view of the browser;
A recognition unit that acquires recognized text by character recognition of the acquired image;
An extractor for extracting body text to be drawn from the document;
A determination unit that determines whether or not the document is correctly rendered by the browser by comparing the recognized text and the body text;
A computer-readable recording medium storing a program that functions as a computer.
Computer
An acquisition unit that obtains an image in which the document is rendered by giving the document to the browser and causing the document to be displayed on the screen or rendered in a virtual view of the browser;
A recognition unit that acquires recognized text by character recognition of the acquired image;
An extractor for extracting body text to be drawn from the document;
A determination unit that determines whether or not the document is correctly rendered by the browser by comparing the recognized text and the body text;
Program to function as.