CN114495127A

CN114495127A - Commodity information processing method, apparatus, device and medium based on RPA and AI

Info

Publication number: CN114495127A
Application number: CN202210332711.3A
Authority: CN
Inventors: 陈愫恺
Original assignee: Laiye Technology Beijing Co Ltd
Current assignee: Laiye Technology Beijing Co Ltd
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2022-05-13
Also published as: WO2023184644A1

Abstract

The disclosure provides a commodity information processing method, a commodity information processing device, commodity information processing equipment and a commodity information processing medium based on RPA and AI, and relates to the field of AI and RPA, wherein the method comprises the following steps: the RPA robot acquires a commodity packaging diagram corresponding to a target commodity; identifying text content in the commodity packaging image based on an OCR technology; acquiring document content in a reference document; comparing the text content with the document content to determine a first difference part in the text content, which is different from the document content; and carrying out exception marking on the first difference part in the text content, and/or carrying out exception marking on the area where the first difference part is located in the commodity packaging diagram. Therefore, the commodity information on the commodity packaging drawing is automatically checked through the RPA robot, the manual participation amount can be reduced, the human resources are released, the labor cost is reduced, the checking efficiency of the commodity information can be improved, the condition that manual checking is prone to error can be avoided, and the accuracy of the commodity information checking result is improved.

Description

Commodity information processing method, apparatus, device and medium based on RPA and AI

Technical Field

The present disclosure relates to the field of Artificial Intelligence (AI) and Robot Process Automation (RPA), and in particular, to a method, an apparatus, a device, and a medium for processing commodity information based on RPA and AI.

Background

The RPA simulates the operation of a human on a computer through specific 'robot software', and automatically executes flow tasks according to rules.

AI is a technical science that studies and develops theories, methods, techniques and application systems for simulating, extending and expanding human intelligence.

Intelligent Document Processing (IDP) is a new generation of automation technology that identifies, classifies, extracts, and checks various documents based on artificial intelligence technologies such as Optical Character Recognition (OCR), Computer Vision (CV), Natural Language Processing (NLP), and Knowledge Graph (KG), and helps enterprises to realize intelligence and automation of Document Processing.

For the commodity, different shapes of packages may be designed at different periods, for example, for different holidays, a commodity package matched with the atmosphere of each holiday is designed, and for example, when the commodity is linked with different celebrities or games, a new commodity package is also designed, and the like. The commodity package generally includes information such as nutrient composition table, ingredient information, manufacturer, address and production place, and if the information is wrong, certain legal problems may be caused. Therefore, it is very important to check the product information on the product package.

In the related art, when designing a new package of a commodity, the package of the commodity is checked a plurality of times by employees of a plurality of departments.

However, the manual multiple checking method is not only inefficient, but also the accuracy of the checking result cannot be guaranteed. In addition, in the case where there are a plurality of manufacturers, addresses, and production places, manual checking is difficult and laborious, and is easy to miss.

Disclosure of Invention

The present disclosure is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, the present disclosure provides a commodity information processing method, device, equipment and medium based on RPA and AI, so as to implement automatic checking of commodity information on a commodity packaging diagram through an RPA robot, on one hand, the amount of manual participation can be reduced, human resources can be released, and the human cost can be reduced, on the other hand, the checking efficiency of commodity information can be improved, the situation of easy error of manual checking can be avoided, and the accuracy of the commodity information checking result can be improved.

An embodiment of a first aspect of the present disclosure provides a commodity information processing method based on RPA and AI, where the method is performed by an RPA robot, and includes:

acquiring a commodity packaging image corresponding to a target commodity, and identifying text contents in the commodity packaging image based on an Optical Character Recognition (OCR) technology;

acquiring a reference document and acquiring document contents in the reference document, wherein the document contents comprise commodity information corresponding to the target commodity;

comparing the text content with the document content to determine a first difference part in the text content, which is different from the document content;

and carrying out exception marking on the first difference part in the text content, and/or carrying out exception marking on an area where the first difference part is located in the commodity packaging diagram.

An embodiment of a second aspect of the present disclosure provides a commodity information processing apparatus based on RPA and AI, applied to an RPA robot, including:

the first acquisition module is used for acquiring a commodity packaging diagram corresponding to a target commodity;

the recognition module is used for recognizing the text content in the commodity packaging drawing based on an Optical Character Recognition (OCR) technology;

the second acquisition module is used for acquiring a reference document and acquiring document contents in the reference document, wherein the document contents comprise commodity information corresponding to the target commodity;

the comparison module is used for comparing the text content with the document content to determine a first difference part which is different from the document content in the text content;

and the labeling module is used for performing abnormal labeling on the first difference part in the text content and/or performing abnormal labeling on an area where the first difference part is located in the commodity packaging diagram.

An embodiment of a third aspect of the present disclosure provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method according to the embodiment of the first aspect of the present disclosure.

A fourth aspect of the present disclosure is directed to a non-transitory computer-readable storage medium, having a computer program stored thereon, where the computer program, when executed by a processor, implements the method according to the first aspect of the present disclosure.

A fifth aspect of the present disclosure provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the method according to the first aspect of the present disclosure.

The technical scheme provided by the embodiment of the disclosure has the following beneficial effects:

acquiring a commodity packaging image corresponding to a target commodity through an RPA robot, and identifying text contents in the commodity packaging image based on an OCR technology; acquiring a reference document and acquiring document contents in the reference document, wherein the document contents comprise commodity information corresponding to a target commodity; comparing the text content with the document content to determine a first difference part in the text content, which is different from the document content; and carrying out exception marking on the first difference part in the text content, and/or carrying out exception marking on the area where the first difference part is located in the commodity packaging diagram. Therefore, commodity information on the commodity packaging drawing can be automatically checked through the RPA robot, on one hand, the manual participation amount can be reduced, manpower resources are released, and the manpower cost is reduced.

Additional aspects and advantages of the disclosure will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the disclosure.

Drawings

The foregoing and/or additional aspects and advantages of the present disclosure will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flowchart of a commodity information processing method based on RPA and AI according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of another method for processing commodity information based on RPA and AI according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of sub-images obtained after the commodity packaging diagram is cut according to the embodiment of the present disclosure;

FIG. 4 is a first schematic diagram illustrating a portion of a packaging diagram for a commodity in accordance with an embodiment of the present disclosure;

fig. 5 is a schematic flowchart of another commodity information processing method based on RPA and AI according to an embodiment of the present disclosure;

FIG. 6 is a schematic illustration of a reconciliation report in an embodiment of the present disclosure;

fig. 7 is a schematic flowchart of another method for processing commodity information based on RPA and AI according to an embodiment of the present disclosure;

fig. 8 is a schematic flowchart of another method for processing commodity information based on RPA and AI according to an embodiment of the present disclosure;

fig. 9 is a schematic view of first nutritional component information in an embodiment of the present disclosure;

fig. 10 is a schematic flowchart of another method for processing commodity information based on RPA and AI according to an embodiment of the present disclosure;

FIG. 11 is a schematic diagram of an implementation of an embodiment of the present disclosure;

FIG. 12 is a second partial schematic view of a merchandise packaging diagram in accordance with an embodiment of the present disclosure;

FIG. 13 is a first schematic diagram illustrating OCR recognition results in an embodiment of the present disclosure;

FIG. 14 is a schematic diagram illustrating the extraction result of ingredient information in an embodiment of the present disclosure;

FIG. 15 is a diagram illustrating a second OCR recognition result in an embodiment of the present disclosure;

fig. 16 is a schematic diagram of an extraction result of a factory name, a factory address, and a production license number in the embodiment of the present disclosure;

FIG. 17 is a third attribute field schematic in an embodiment of the present disclosure;

FIG. 18 is a schematic illustration of a configuration template in an embodiment of the disclosure;

FIG. 19 is a schematic diagram of an extraction rule or rule for ingredients in an embodiment of the disclosure;

FIG. 20 is a third exemplary illustration of OCR recognition results in an embodiment of the disclosure;

FIG. 21 is a fourth schematic diagram illustrating OCR recognition results in an embodiment of the present disclosure;

fig. 22 is a schematic structural diagram of a commodity information processing apparatus based on RPA and AI according to an embodiment of the present disclosure;

FIG. 23 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Reference will now be made in detail to the embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are exemplary and intended to be illustrative of the present disclosure, and should not be construed as limiting the present disclosure.

The disclosure provides a commodity information processing method, a commodity information processing device, commodity information processing equipment and a commodity information processing medium based on RPA and AI.

A commodity information processing method, apparatus, device, and medium based on RPA and AI according to an embodiment of the present disclosure are described below with reference to the accompanying drawings. Before the embodiments of the present disclosure are described in detail, for ease of understanding, common technical terms are first introduced:

the "RPA" is a short term for robot Process Automation (robotics Automation), and is a professional and comprehensive Process Automation solution for enterprises and individuals. The RPA simulates the operation of a human on a computer through specific 'robot software', and automatically executes flow tasks according to rules. Namely, the RPA robot can quickly and accurately collect data of a user operation interface by simulating mouse and keyboard operations of a user, process the data based on clear logic rules, and quickly and accurately input the data into another system or interface. Therefore, the investment of labor cost can be greatly reduced, the existing office efficiency is effectively improved, and the work is accurately, stably and quickly finished.

"AI" is a short for Artificial Intelligence (Artificial Intelligence) and is a technical science for studying and developing theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence. AI is a subject of research that has led computers to simulate certain mental processes and intelligent behaviors of humans (e.g., learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. AI hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the AI software technology mainly includes computer vision technology, speech recognition technology, Natural Language Processing (NLP) technology, machine learning/deep learning, big data Processing technology, knowledge map technology, and so on.

"merchandise" is a labor product produced for sale and is a labor product for exchange. For example, the merchandise may include food, commodities, health products, and the like.

The "target commodity" may be any commodity, for example, the target commodity may be a food, a daily necessity, or the like.

The "commodity packaging drawing", also called a commodity packaging design drawing, refers to an image including a packaging design of a target commodity.

The term "commodity information" refers to information related to a target commodity, and for example, the commodity information may include information on nutritional components, ingredients (or component information), manufacturer, address, and place of production of the target commodity.

The "reference document", or referred to as a document to be compared, refers to a document including commodity information corresponding to a target commodity, for example, the reference document may be a structured document, such as an Excel document, or the reference document may also be an unstructured document, such as a Word document. It should be understood that when the reference document is an unstructured document, the unstructured reference document may be converted into a structured document in order to facilitate information comparison by the RPA robot.

"Optical Character Recognition (OCR)" refers to a process in which an electronic device checks a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method; the method is characterized in that characters in a paper document are converted into an image file with a black-white dot matrix in an optical mode aiming at print characters, and the characters in the image are converted into a text format through recognition software for further editing and processing by word processing software.

The "first attribute field" refers to an attribute field included in the text content corresponding to the product packaging diagram, for example, the first attribute field may include: production license (or called as production license number and production number), address, manufacturer, ingredients, storage condition, shelf life, production date, net content, product type and the like.

The "first attribute value" refers to an attribute value corresponding to the first attribute field in the text content, for example, taking the target product as a food, the attribute value corresponding to the ingredient may be: drinking water, cheese powder, citric acid, etc.

The "second attribute field" refers to an attribute field included in the document content in the reference document, and correspondingly, the second attribute value refers to the corresponding attribute value of the second attribute field in the document content. The second attribute field is a standard attribute field corresponding to the target product, and the second attribute value is a standard attribute value corresponding to the target product.

It should be understood that the first attribute field and/or the first attribute value may be incorrect at the design stage, however, both the second attribute field and the second attribute value are related to the target product and the correct attribute field and attribute value are written.

The term "set vocabulary" refers to a preset vocabulary, which may also be referred to as a custom vocabulary. The set word list includes attribute fields related to the product information of the target product, which are denoted as third attribute fields in the present disclosure. For example, the third attribute field may include: production license, address, manufacturer, ingredients, storage conditions, shelf life, production date, net content, product type, etc.

It should be noted that, considering the accuracy of the OCR recognition result, for some attribute fields, such as ingredients, the OCR recognition result may be "material", which results in that the "material" word is not recognized, or the recognition result may be "ingredient", which results in that there are more spaces in the recognition result. In view of the above situation, for each attribute field related to the commodity information, multiple descriptions or multiple possible descriptions of the attribute field may be counted, and the multiple descriptions or multiple possible descriptions corresponding to the attribute field and the attribute field are all used as the third attribute field and are set in the set vocabulary.

For example, for the attribute field of "ingredient", the set word list may include: "compounding", "dosing", "compounding" and the like.

The "third attribute value" refers to an attribute value corresponding to the third attribute field in the text content corresponding to the product packaging diagram of the target product, for example, taking the target product as a food, the attribute value corresponding to the ingredient may be: drinking water, cheese powder, citric acid, etc.

The "target Document" is a Document including a product package diagram of a target product, and for example, the target Document may be a PDF (Portable Document Format) Document, or may be a design Document in a Format such as PSD (PSD is a proprietary Format of Photoshop of graphics design software of Adobe corporation), Adobe Illustrator (specifically, file extension of Adobe Illustrator, which is a vector graphics file Format), and the like.

The "first nutritional component information" is nutritional component information related to the target product included in the text content, for example, the target product is taken as a food, and the first nutritional component information may include: energy, protein, fat, carbohydrate, etc.

The "second nutritional component information" is nutritional component information related to the target product included in the document content. It should be understood that the first nutritional component information may be wrong in the design process, but the second nutritional component information is related to the target product and the correct nutritional component information is written.

"regular expressions", also known as regular expressions, are used to retrieve or replace text that conforms to a certain pattern (or rule).

The term "any text segment" refers to any text segment in the first nutritional ingredient information, wherein the same text segment includes characters at adjacent positions and/or characters at intervals of a first set number (e.g., 1 or 2) of spaces.

The "adjacent text segment" refers to a text segment adjacent to the "any text segment" in the "first nutrient composition information", for example, the "adjacent text segment" may be: text segments located on the left, right, upper, and lower sides of "any text segment".

As an example, taking the target product as a food, the first nutritional component information may be as shown in table 1:

TABLE 1

Assuming that "any text segment" is "carbohydrate" in table 1, the "adjacent text segment" may be "5.8 g", "fat", "sodium".

The target detection algorithm belongs to the field of computer vision in the field of AI. Whether the image includes the desired content may be detected based on an object detection algorithm in the deep learning technique.

Fig. 1 is a schematic flow chart of a commodity information processing method based on RPA and AI according to an embodiment of the present disclosure.

The commodity information processing method based on RPA and AI provided by the embodiment of the disclosure can be applied to an RPA robot, and the RPA robot can operate in any electronic equipment with computing capability. The electronic device may be a personal computer, a mobile terminal, and the like, and the mobile terminal is, for example, a mobile phone, a tablet computer, a personal digital assistant, and other hardware devices having various operating systems.

As shown in fig. 1, the RPA and AI-based merchandise information processing method may include the steps of:

step 101, acquiring a commodity packaging image corresponding to a target commodity, and identifying text contents in the commodity packaging image based on an OCR technology.

In the embodiment of the present disclosure, the commodity packaging map may be an image in an image format such as JPG (Joint Photographic Experts Group), PNG (Portable Network Graphics), and the like.

In a possible implementation manner of the embodiment of the present disclosure, the RPA robot may directly obtain a commodity packing diagram corresponding to a target commodity.

As an example, the commodity package diagram may be uploaded or sent to a device where the RPA robot is located manually, for example, a service person may take a picture of a target commodity through an image acquisition device (such as a camera, a mobile terminal, or the like) to obtain the commodity package diagram in an image file format, or the service person may scan a paper file containing the commodity package diagram to obtain a document in a PDF format, and capture a screenshot of the commodity package diagram in the document to obtain the commodity package diagram in the image file format. After obtaining the commodity packaging diagram, the service personnel can upload or send the commodity packaging diagram to the equipment where the RPA robot is located.

In another possible implementation manner of the embodiment of the present disclosure, the RPA robot may also indirectly obtain a product package diagram corresponding to the target product.

As an example, the RPA robot may obtain a target document containing a commodity package diagram, for example, the target document may be manually uploaded or sent to a device where the RPA robot is located, so that the RPA robot may extract the commodity package diagram from the target document after obtaining the target document. For example, the RPA robot may identify and intercept the commodity packaging diagram from the target document based on a target detection algorithm.

In the embodiment of the present disclosure, after obtaining the product package diagram, the RPA robot may perform character recognition on the product package diagram based on an OCR technology to obtain text content of the product package diagram.

Step 102, obtaining a reference document, and obtaining document contents in the reference document, wherein the document contents include commodity information corresponding to a target commodity.

In the embodiment of the present disclosure, the RPA robot may obtain the reference document, for example, the reference document may be manually uploaded or sent to a device where the RPA robot is located. After acquiring the reference document, the RPA robot may read the document content in the reference document.

Step 103, comparing the text content with the document content to determine a first difference part different from the document content in the text content.

In an embodiment of the disclosure, the RPA robot may compare the text content with the document content to determine a first difference portion in the text content that is different from the document content.

And 104, carrying out exception marking on the first difference part in the text content, and/or carrying out exception marking on an area where the first difference part is located in the commodity packaging diagram.

In a possible implementation manner of the embodiment of the present disclosure, the RPA robot may perform exception marking on the first difference portion in the text content. For example, the RPA robot may adjust the font and/or font size of the first difference portion (e.g., increase the font size, tilt the font, and/or thicken the font, etc.) in the text content, and color label the adjusted first difference portion; alternatively, the RPA robot may also color-label the first difference portion directly in the text content, for example, the first difference portion may be color-labeled with a conspicuous color (e.g., red, blue, etc.), which is not limited by the present disclosure.

In another possible implementation manner of the embodiment of the present disclosure, the RPA robot may determine an area where the first difference portion is located in the commodity packaging diagram, and perform exception marking on the area in the commodity packaging diagram. For example, a label box may be added to the edge of the above-mentioned area; alternatively, underlining, wavy lines, etc. may be added under the characters in the above-described area, which the present disclosure does not limit.

In yet another possible implementation manner of the embodiment of the present disclosure, the RPA robot may further perform exception marking on the first difference portion in the text content, and perform exception marking on an area where the first difference portion is located in the commodity packaging diagram.

Optionally, after the RPA robot performs the abnormal labeling on the text content, the RPA robot may further display the labeled text content, and/or after the RPA robot performs the abnormal labeling on the commodity packaging diagram, the RPA robot may further display the labeled commodity packaging diagram, so that the relevant personnel can obtain the comparison result in time.

According to the commodity information processing method based on the RPA and the AI, the RPA robot is used for acquiring a commodity packaging image corresponding to a target commodity, and text contents in the commodity packaging image are identified based on an OCR technology; acquiring a reference document and acquiring document contents in the reference document, wherein the document contents comprise commodity information corresponding to a target commodity; comparing the text content with the document content to determine a first difference part in the text content, which is different from the document content; and carrying out exception marking on the first difference part in the text content, and/or carrying out exception marking on the area where the first difference part is located in the commodity packaging diagram. Therefore, commodity information on the commodity packaging drawing can be automatically checked through the RPA robot, on one hand, the manual participation amount can be reduced, manpower resources are released, and the manpower cost is reduced.

In order to clearly illustrate how the RPA robot compares text content with document content in any embodiment of the disclosure, the disclosure also provides a commodity information processing method based on RPA and AI.

Fig. 2 is a schematic flow chart of another commodity information processing method based on RPA and AI according to an embodiment of the present disclosure.

As shown in fig. 2, the RPA and AI based merchandise information processing method may include the steps of:

step 201, obtaining a product package diagram corresponding to a target product, and recognizing text content in the product package diagram based on an OCR technology.

In any embodiment of the disclosure, the commodity packaging diagram can be segmented into at least one sub-image by means of manual selection, so that the at least one sub-image can be subjected to character recognition based on an OCR technology to obtain text content.

That is, in the present disclosure, the RPA robot may segment the product packaging diagram into at least one sub-image in response to an intercepting operation triggered by a relevant person, and perform character recognition on the at least one sub-image based on an OCR technology to obtain text content.

As an example, taking the target product as a food, the relevant person may segment the product package diagram into 6 sub-regions as shown in fig. 3 by way of circle selection.

In any embodiment of the disclosure, the RPA robot may identify and extract at least one target region from the commodity packaging map based on a target detection algorithm in a deep learning technique, wherein the target region includes character information. The RPA robot may perform character recognition on the at least one target region based on OCR technology to obtain text content.

Step 202, obtaining a reference document, and obtaining document contents in the reference document, wherein the document contents include commodity information corresponding to the target commodity.

The execution process of steps 201 to 202 may refer to the execution process of any embodiment of the present disclosure, and is not described herein again.

Step 203, extracting each first attribute field from the text content, and extracting a first attribute value matched with each first attribute field from the text content.

In the embodiment of the present disclosure, the first attribute field may be extracted from the text content, and the first attribute value matching each first attribute field may be extracted from the text content.

As an example, the target product is taken as a food product, a part of a product package diagram may be as shown in fig. 4, and ": "previous text segment as the first attribute field, will": "the text segment after the text segment is used as the first attribute value corresponding to the first attribute field.

As another example, an attribute table including attribute fields related to a target product may be preset, so that in the present disclosure, first attribute fields matching the attribute fields in the attribute table may be extracted from text content, and after the first attribute fields are extracted, first attribute values corresponding to the first attribute fields may be extracted from the text content based on a set extraction rule or an extraction rule.

For example, an attribute value between two adjacent first attribute fields may be extracted from the text content and used as the first attribute value corresponding to the previous attribute field in the two adjacent first attribute fields. The character content after the last first attribute field may be the first attribute value corresponding to the last first attribute field.

In practical application, the inventor analyzes a large number of package design drawings, and finds that: the characters following the last attribute field may include not only the attribute value, but also other characters such as "keep the environment clean, do not have to be thrown the bottle" and so on.

In view of the above situation, in the present disclosure, a large number of package design drawings may be analyzed and counted, a sentence located after the last attribute field in each package design drawing is determined, and an ending identifier is set according to the sentence, for example, the ending identifier may be "keep environment", and the like, so that when the RPA robot recognizes that the text content includes the ending identifier, the RPA robot may intercept character content between the last first attribute field and the ending identifier, and use the character content as the first attribute value corresponding to the last first attribute field.

Step 204, comparing the first attribute values corresponding to the first attribute fields and the first attribute fields with the second attribute values corresponding to the second attribute fields and the second attribute fields in the document content.

In the embodiment of the present disclosure, the first attribute fields in the text content and the first attribute values corresponding to the first attribute fields may be compared with the second attribute fields in the document content and the second attribute values corresponding to the second attribute fields.

Step 205, when there is a mismatch between the first target attribute field and the second attribute field in each first attribute field, taking the first attribute value corresponding to the first target attribute field and/or the first target attribute field as a first difference part.

In the embodiment of the present disclosure, when it is determined that at least one attribute field (denoted as a first target attribute field in the present disclosure) in each first attribute field does not match the second attribute field, the first attribute value corresponding to the first target attribute field and/or the first target attribute field may be used as the first difference portion.

Step 206, when there is a second target attribute field matching with the second attribute field in each first attribute field, but the first attribute value corresponding to the second target attribute field does not match with the second attribute value corresponding to the second attribute field, the first attribute value corresponding to the second target attribute field is used as the first difference portion.

In the embodiment of the present disclosure, when it is determined that one attribute field (referred to as a second target attribute field in the present disclosure) in each first attribute field matches the second attribute field, but a first attribute value corresponding to the second target attribute field does not match a second attribute value corresponding to the second attribute field, the first attribute value corresponding to the second target attribute field may be used as the first difference portion.

And step 207, carrying out exception marking on the first difference part in the text content, and/or carrying out exception marking on an area where the first difference part is located in the commodity packaging diagram.

The execution process of step 207 may refer to the execution process of any embodiment of the present disclosure, and is not described herein again.

According to the commodity information processing method based on RPA and AI, the attribute fields and the attribute values in the text content are respectively compared with the attribute fields and the attribute values in the document content, so that missing detection of important contents in the commodity information can be avoided, and the accuracy of the commodity information check result is improved.

It should be noted that, considering the accuracy of the OCR recognition result, for some attribute fields, such as ingredients, the OCR recognition result may be "material", which results in that the "material" word is not recognized, or the recognition result may be "ingredient", which results in that there are more spaces in the recognition result. The above situation will cause that the RPA robot cannot identify the attribute field of "ingredient", so that the attribute value corresponding to "ingredient" cannot be extracted, and further cause that the RPA robot cannot compare the ingredient information in the commodity packaging diagram.

In view of the above problem, in the present disclosure, for each attribute field related to a target product, multiple descriptions or multiple possible descriptions of the attribute field may be counted, and the multiple descriptions or multiple possible descriptions corresponding to the attribute field and the attribute field are all set as the third attribute field and are set in the set vocabulary. Therefore, in the present disclosure, the third attribute value corresponding to each third attribute field in the set word list may be extracted from the text content based on the set word list, so that the third attribute value may be compared with each second attribute value in the document content. The above process is described in detail below with reference to fig. 5.

Fig. 5 is a schematic flowchart of another commodity information processing method based on RPA and AI according to an embodiment of the present disclosure.

As shown in fig. 5, on the basis of the embodiment shown in fig. 2, the RPA and AI-based merchandise information processing method may further include the steps of:

step 301, a set word list is obtained, wherein the set word list includes at least one third attribute field.

In the embodiment of the present disclosure, the set vocabulary is preset, and in the present disclosure, the RPA robot may obtain the preset set vocabulary.

Step 302, extracting third attribute values matched with each third attribute field in the set word list from the text content.

In the disclosed embodiment, the RPA robot may extract, from the text content, a third attribute value that matches each third attribute field in the set vocabulary. The specific implementation process is similar to step 203, and is not described herein again.

Step 303, comparing the third attribute values corresponding to the third attribute fields with the second attribute values corresponding to the second attribute fields in the document content.

And step 304, taking the target attribute value as a first difference part under the condition that the target attribute value does not match with the second attribute value in the third attribute values.

In this disclosure, the third attribute values corresponding to the third attribute fields may be respectively compared with the second attribute values corresponding to the second attribute fields in the document content, and if at least one attribute value (denoted as a target attribute value in this disclosure) in each third attribute value is not matched with the second attribute value, the target attribute value may be used as the first difference portion. If the third attribute values match the second attribute values, no processing is required.

It should be noted that, the present disclosure does not limit the execution timing of steps 301 to 304, for example, steps 301 to 304 may be executed after step 206, or steps 301 to 304 may also be executed in parallel with steps 203 to 206, or steps 301 to 304 may also be executed before step 203, and so on. That is, steps 301 to 304 need only be performed before step 207.

In any embodiment of the present disclosure, in order to enable the relevant person to check and/or modify the product package diagram in time when the first difference portion exists in the text content, the RPA robot may further send a prompt message, where the prompt message is used to prompt the checking and/or modifying of the first difference portion in the product package diagram.

For example, the RPA bot may send a reminder message to a specified account (such as a mailbox account number); for another example, the device where the RPA robot is located may log in instant messaging software, and the RPA robot may send prompt information to an instant messaging account where a related person is located.

In any embodiment of the disclosure, the RPA robot may generate and display a verification report according to at least one of a correspondence between the first attribute field and the first attribute value, a correspondence between the third attribute field and the third attribute value, and the first nutritional component information of the target product in the text content, so that the relevant person may verify the product packaging diagram based on the verification report. For example, the reconciliation report may be as shown in FIG. 6.

In any of the embodiments of the present disclosure, the RPA robot may not only send the prompt information, but also generate the verification report.

In a possible implementation manner of the embodiment of the present disclosure, the RPA robot may further compare the document content with the text content to determine a second difference portion different from the text content in the document content, where the comparison manner is similar to the manner of comparing the text content with the document content in the foregoing embodiment, and is not repeated here. In this disclosure, when the RPA robot determines that the second difference portion exists in the document content, the RPA robot may perform an abnormal annotation on the second difference portion in the document content, and display the annotated document content. The labeling manner of the second difference portion is similar to that of the first difference portion, and is not described herein again.

According to the commodity information processing method based on RPA and AI, the attribute values in the text content are further extracted according to the set word list, and the extracted attribute values are compared with the attribute values in the document content, so that the condition that the extraction of the attribute values is omitted due to low accuracy of an OCR recognition result can be avoided, and the accuracy of a commodity information checking result is improved.

Fig. 7 is a schematic flowchart of another commodity information processing method based on RPA and AI according to an embodiment of the present disclosure.

As shown in fig. 7, the RPA and AI-based merchandise information processing method may include the steps of:

step 401, obtaining a product package diagram corresponding to a target product, and recognizing text content in the product package diagram based on an OCR technology.

Step 402, obtaining a reference document, and obtaining document contents in the reference document, wherein the document contents include commodity information corresponding to a target commodity.

The execution process of steps 401 to 402 may refer to the execution process of any embodiment of the present disclosure, and is not described herein again.

Step 403, extracting first nutritional component information of the target product from the text content, and extracting second nutritional component information from the document content.

In the disclosed embodiment, the first nutritional component information of the target product may be extracted from the text content. For example, to exemplarily divide the product packaging diagram into a plurality of sub-images, the first nutritional component information may be contained in a certain sub-region, which is denoted as a target sub-image in the present disclosure, for example, the target sub-image may be as shown in sub-image 1 in fig. 3, and character recognition may be performed on the target sub-region based on an OCR technology to obtain the first nutritional component information.

That is, the text content is composed of OCR recognition results corresponding to a plurality of sub-images, and a target sub-image including the first nutritional component information may be determined from the plurality of sub-images, and the OCR recognition result corresponding to the target sub-image may be determined from the text content.

In the disclosed embodiment, the RPA robot may also extract second nutritional component information from the document content.

Step 404, comparing each component information in the first nutrient component information with the corresponding component information in the second nutrient component information.

In step 405, in the case where there is a mismatch between the target ingredient information in the first nutritional ingredient information and the corresponding ingredient information in the second nutritional ingredient information, the target ingredient information is regarded as a first difference portion.

In the embodiment of the present disclosure, each component information (for example, energy, protein, fat, and other component information) in the first nutritional component information may be matched with the corresponding component information in the second nutritional component information, and when at least one component information (referred to as target component information in the present disclosure) in the first nutritional component information is not matched with the corresponding component information in the second nutritional component information, the target component information may be regarded as the first difference portion.

And 406, carrying out exception marking on the first difference part in the text content, and/or carrying out exception marking on an area where the first difference part is located in the commodity packaging diagram.

The execution process of step 406 may refer to the execution process of any embodiment of the present disclosure, and is not described herein again.

According to the commodity information processing method based on RPA and AI, the nutrient component information in the text content is compared with the nutrient component information in the document content, so that the table content in the commodity packaging diagram can be checked, missing checking of the commodity information is avoided, and reliability of the checking result is improved.

It should be noted that, most nutrient tables are frameless tables, so that it is difficult for the conventional table recognition algorithm to recognize the nutrient tables, for example, the conventional table recognition algorithm cannot clearly identify the left, middle, right, line feed, and the like in the frameless tables. For example, suppose that the carbohydrates in neutron image 1 in fig. 3, if wrapped, may become as shown in table 2 or table 3:

TABLE 2

Carbohydrate(s)	5.8g	2%
			Compound (I)

TABLE 3

Carbohydrate compound

5.8g

2%

It is understood that tables 2 and 3 are easy to understand for humans, but it is difficult for machines to judge that carbohydrates are a complete word, so current general table recognition algorithms are difficult to recognize. In view of the above problems, in the present disclosure, after the first nutritional ingredient information is extracted from the text content, regular replacement may be performed on the erroneously identified ingredient information in the first nutritional ingredient information. The above process is described in detail below with reference to fig. 8.

Fig. 8 is a schematic flowchart of another method for processing commodity information based on RPA and AI according to an embodiment of the present disclosure.

As shown in fig. 8, the RPA and AI based merchandise information processing method may include the steps of:

step 501, obtaining a commodity packaging image corresponding to a target commodity, and identifying text contents in the commodity packaging image based on an OCR technology.

Step 502, obtaining a reference document, and obtaining document contents in the reference document, wherein the document contents include commodity information corresponding to a target commodity.

Step 503, extracting the first nutritional component information of the target product from the text content, and extracting the second nutritional component information from the document content.

The execution process of steps 501 to 503 may refer to the execution process of any embodiment of the present disclosure, and is not described herein again.

Step 504, aiming at any component information in the first nutritional component information, a regular expression matched with any component information is obtained.

In the embodiment of the present disclosure, a regular expression corresponding to each component information may be preset, so that in the present disclosure, the RPA robot may obtain a regular expression corresponding to each component information in the first nutritional component information.

And 505, matching the regular expression with any component information.

And step 506, if the component information is not matched with the regular expression, replacing any component information based on the regular expression.

In the embodiment of the disclosure, for any one of the first nutritional ingredient information, if the any one of the first nutritional ingredient information does not match with the corresponding regular expression, the any one of the first nutritional ingredient information may be subjected to replacement processing based on the regular expression corresponding to the any one of the first nutritional ingredient information. If any component information is matched with the corresponding regular expression, replacement processing on any component information can be omitted.

For example, if the unit corresponding to "carbohydrate" is "g", and if the unit corresponding to "carbohydrate" in the first nutritional ingredient information is "9", the "9" may be automatically replaced with "g" using the regular expression corresponding to "carbohydrate".

For another example, as shown in fig. 1, the last item "NRV" corresponding to each nutrient component is the percentage of the nutrient needed all day, and if the unit corresponding to "NRV" in each item of component information in the first nutrient component information is not "%" but is other symbols, the other symbols can be automatically replaced with "%" by using the regular expression corresponding to each item of component information.

For another example, assuming that the first nutritional component information in the OCR recognition result is as shown in fig. 9, it may be determined that the first term lacks "object" words and the second term has more "object" words in the component information corresponding to "carbohydrates" according to the regular expression corresponding to "carbohydrates", and then "carbohydrates" in the first term may be replaced by "carbohydrates" and "objects 5.8 g" may be automatically replaced by "5.8 g" by using the regular expression corresponding to "carbohydrates".

It should be noted that, in the present disclosure, only the replacement processing of any component information is performed according to a regular expression, and in an actual application, the replacement processing of any component information may also be performed by determining through a write logic at a code level. For example, the code logic may be: and judging whether the last digit of each item of component information contains a unit or not, and if the last digit does not contain the unit, automatically replacing the last digit with a unit matched with the component information, such as replacing '9' with 'g'.

And step 507, comparing each component information in the replaced first nutrient component information with corresponding component information in the second nutrient component information.

And step 508, in the case that the target component information in the first nutritional component information after the replacement processing does not match with the corresponding component information in the second nutritional component information, taking the target component information as a first difference part.

In step 509, the first difference portion is abnormally labeled in the text content, and/or an area where the first difference portion is located is abnormally labeled in the commodity packaging diagram.

The execution process of step 509 may refer to the execution process of any embodiment of the present disclosure, and is not described herein again.

According to the commodity information processing method based on the RPA and the AI, a regular expression matched with any component information is obtained by aiming at any component information in the first nutritional component information; matching the regular expression with any component information; and if not, performing replacement processing on any component information based on the regular expression. Therefore, the OCR recognition result can be subjected to correction optimization, and the accuracy and reliability of the commodity information comparison result can be further improved.

In order to implement the above embodiments, the present disclosure also provides a commodity information processing method for RPA and AI.

Fig. 10 is a schematic flowchart of another commodity information processing method based on RPA and AI according to an embodiment of the present disclosure.

As shown in fig. 10, the RPA and AI-based merchandise information processing method may include the steps of:

step 601, obtaining a commodity packaging image corresponding to a target commodity, and recognizing text contents in the commodity packaging image based on an OCR technology.

Step 602, obtaining a reference document, and obtaining document contents in the reference document, wherein the document contents include commodity information corresponding to a target commodity.

Step 603, extracting first nutritional component information of the target product from the text content, and extracting second nutritional component information from the document content.

The execution process of steps 601 to 603 may refer to the execution process of any embodiment of the present disclosure, and is not described herein again.

Step 604, for any text segment in the first nutritional component information, determining whether the semantics of any text segment are complete.

In the embodiment of the disclosure, for any text segment in the first nutritional component information, whether the semantics of any text segment are complete or not may be determined.

As an example, whether the semantics of any text segment are complete may be determined based on a semantic analysis algorithm.

It is understood that, in general, the first nutritional component information identifies the error, which includes unit identification error on the one hand, and item (such as protein, carbohydrate, trans fatty acid, vitamin D, etc.) identification error on the other hand, wherein the error is generally identified by the item identification error: the project name is longer, which causes OCR to attribute some of the characters in the project name to content (such as every 100mL in fig. 9).

Therefore, in view of the above problem, as another example, in the present disclosure, statistical analysis may be performed on package design drawings of a large number of commodities, each item included in a nutritional component table corresponding to different commodities is determined, and the item is written in the item table, so in the present disclosure, a text segment in which each item in the first nutritional component information is located may be matched with each item name in the item table, and if a text segment in which a certain item in the first nutritional component information is located is not matched with each item name in the item table, it is determined that the semantics of the text segment in which the item is located is incomplete.

And 605, if the semantic meaning of any text segment is incomplete, acquiring an adjacent text segment adjacent to the text segment from the nutrient composition information.

In the embodiment of the present disclosure, in the case that the semantics of any text segment is incomplete, an adjacent text segment adjacent to any text segment may be obtained from the nutrient composition information.

In step 606, if the semantics of the adjacent text segments are incomplete, the sub-segments with complete semantics are determined from the adjacent text segments.

Step 607, extracting the other characters except the sub-segment in the adjacent text segment, and classifying the other characters into any text segment.

In the embodiment of the present disclosure, it may be determined whether the semantics of the adjacent text segments are complete, and if the semantics of the adjacent text segments are incomplete, a sub-segment with complete semantics may be determined from the adjacent text segments, and other characters except the sub-segment in the adjacent text segments may be extracted, so that the other characters may be classified into any text segment.

And under the condition that the semanteme of the adjacent text fragment is complete, acquiring the next adjacent text fragment adjacent to any text fragment, judging whether the semanteme of the next adjacent text fragment is complete, if the semanteme of the next adjacent text fragment is incomplete, determining a sub-fragment with complete semanteme from the next adjacent text fragment, extracting other characters except the sub-fragment in the next adjacent text fragment, and classifying other characters into any text fragment.

At step 608, other characters are removed from the contiguous text segment.

In the embodiment of the disclosure, the RPA robot may further remove other characters from the adjacent text segment to ensure the accuracy of the first nutritional component information identification result.

And step 609, comparing each component information in the updated first nutrient component information with corresponding component information in the second nutrient component information.

And step 610, in the case that the target component information does not match with the corresponding component information in the second nutritional component information in the updated first nutritional component information, taking the target component information as a first difference part.

Step 611, performing exception marking on the first difference portion in the text content, and/or performing exception marking on an area where the first difference portion is located in the commodity packaging diagram.

The execution process of steps 609 to 611 may refer to the execution process of any embodiment of the present disclosure, and is not described herein again.

As an example, the RPA robot may be disposed on the verification platform side, so that in the present disclosure, automatic verification of the commodity information may be completed on the verification platform side, for example, an implementation principle of an embodiment of the present disclosure may be as shown in fig. 11, and specifically may include the following parts:

the first part uploads a commodity package picture to a checking platform. The format of the commodity packaging diagram may be in a picture format (or referred to as an image format) such as JPG, PNG, or may also be in an uploaded design document in a format such as PDF document, PSD, or the like, and the commodity packaging diagram may be extracted from the document.

For example, the relevant personnel can upload images or documents to the verification platform through a webpage end.

And a second part, cutting the commodity package picture. The commodity packaging diagram can be cut into a plurality of sub-images, for example, the region in the commodity packaging diagram that needs OCR recognition can be manually selected by related personnel, and the region is cut to obtain each sub-region.

As an example, since the uploaded commodity packaging diagram is large, in order to accurately identify the text information in the commodity packaging diagram, the part to be identified may be manually selected from the commodity packaging diagram. For example, the person may circle the area where the nutrient profile is located as shown in FIG. 12. As another example, a human may circle through the various regions as shown in FIG. 3.

The nutrient composition table is mostly a frameless line table, so that the conventional universal table identification algorithm is difficult to identify the nutrient composition table, for example, the universal table identification algorithm cannot clearly determine the left side, the middle side, the right side, line feed and the like in the frameless line table. For example, the carbohydrates in FIG. 3, if they are swapped, may become as shown in Table 2 or Table 3. It will be appreciated that tables 2 and 3 are relatively understandable for humans, but it is difficult for machines to determine that carbohydrates are a complete word, and thus current general table recognition algorithms are relatively difficult to recognize. Aiming at the problems, the checking platform judges the specific words in the nutrient composition table through writing logic at a code level to perform correction optimization on the OCR recognition result.

For example, the OCR recognition results of the nutritional component table in fig. 3 may be as shown in fig. 9, the OCR recognition results may be subjected to secondary optimization, and the optimized OCR recognition results may be as shown in table 1.

Extraction of ingredients, as shown in fig. 13, may remove the line-feed symbol in the OCR recognition result to obtain a long text, and then may extract the ingredient information from the OCR recognition result by checking the configuration template on the platform (where the configuration template includes an extraction rule or an extraction rule for extracting the attribute value corresponding to each attribute field). For example, the ingredient information in the OCR recognition result in fig. 13 is extracted at the verification platform, and the extraction result may be as shown in fig. 14.

The extraction of the producer (which is called factory name subsequently), the place and address of production (which is called factory address subsequently), and the production license (or called production license number) is similar to the ingredient. For example, OCR recognition is performed on an image area where the plant name and the plant address are located, the recognition result may be as shown in fig. 15, the line feed symbol in the OCR recognition result may be removed to obtain a long text, and then the plant name and the plant address may be extracted from the OCR recognition result by configuring a template. For example, the factory name, factory address and food production license number in the OCR recognition result in fig. 15 are extracted at the collation platform, and the extraction result may be as shown in fig. 16.

That is, in the present disclosure, attribute fields to be extracted, such as an extraction manufacturer (hereinafter referred to as a factory name), a place of origin, an address (hereinafter referred to as a factory address), and the like, may be defined on the verification platform side. As an example, the defined attribute fields may be as shown in fig. 17, so that the attribute values matching with the attribute fields may be extracted from the OCR recognition result, and then the extracted attribute fields and attribute values may be compared with the attribute fields and attribute values in the document content.

Further, a custom vocabulary (referred to as a set vocabulary in the present disclosure) for matching extraction may be set on the verification platform side. For example, ingredient information must appear in the words "ingredient" or "ingredient: "later, but considering the accuracy of the OCR recognition result, it is possible to recognize the" material "word, the" ingredient "word is not recognized, or the recognized" ingredient "has a space in the middle, which can be configured in the vocabulary as an enumeration.

In configuring the configuration template, a custom vocabulary corresponding to each attribute field may be used, for example, the configuration template may be as shown in fig. 18.

Fig. 19 is an extraction rule of ingredients, and may identify whether a word in a custom vocabulary corresponding to the ingredients is included in text content, and if the word is included, may output any 0-500 character content located after the word in the text content to an ingredient field, that is, to serve as an attribute value corresponding to the attribute field of the ingredients. If the character content behind the word in the text content comprises the word (marked as ending mark in the disclosure) in the self-defined word list segmented vocabulary, the character information behind the ending mark does not need to be extracted, namely the character content between the word and the ending mark is used as the attribute value corresponding to the ingredient.

And a third part, uploading the reference document to a checking platform. In order to make the comparison result or the check result more accurate and reduce the check error rate, the format of the reference document may be a standard structured document, such as an Excel document. If a structured document cannot be used, a fixed template structured document, such as a Word document, can be used.

For example, the relevant personnel can upload the reference document to the checking platform through a webpage.

And fourthly, performing OCR recognition on the commodity packaging image to obtain text content. In order to improve the accuracy of the identification result, it is necessary to ensure that the commodity packaging diagram is sufficiently clear. According to the test of different images, the size of the cut image is more than 8MB, so that higher identification accuracy can be ensured.

Fifth, the document extracts understanding. Unstructured document content may be converted into structured data. Alternatively, the business person may compose the reference document in a set format, so that structured conversion of the document contents of the reference document may not be required. For example, the intelligent extraction of key information in document content can be completed through intelligent document comprehension capability in the IDP system, and the unstructured document content can be converted into structured data.

And a sixth part, comparing the information to determine a first difference part different from the document content in the text content and/or determine a second difference part different from the text content in the document content. The fourth partially extracted text content and the fifth partially extracted document content may be compared using OCR technology and a document information extraction function. The comparison logic is as follows: classifying the text content, such as attribute fields, attribute values, first nutritional component information and the like; and comparing the classified text contents with corresponding contents in the document contents in sequence, and marking inconsistent or excessive text parts. Moreover, the document content can be sequentially checked (or called as a reverse check) with the text content to ensure that all the content in the text content participates in the check, so as to avoid the situation that the accuracy of the check result is reduced because a certain content does not participate in the comparison.

In addition, the text content can be logically corrected to improve the accuracy of the OCR recognition result, so that the accuracy of the checking result is improved. For example, regular substitutions may be made in the code logic for different constituent information in the nutrient table, for example, the unit of protein is g, and if the unit of protein in the OCR recognition result is 9, then g may be substituted for 9, thereby improving the accuracy of the OCR recognition result.

And in the seventh part, the results are shown. The comparison result may be presented in a web page, for example, a first difference portion may be marked in the text content, and a second difference portion may be marked in the document content. In addition, the position of the first difference portion may be marked in the packing diagram of the article.

In fig. 13, the amount of lactic acid bacteria added in the product packaging diagram is: 1.0X 10⁷CFU/100g, but OCR recognition results are: 1.0 × 107 CFU/100g, namely, the OCR recognition result does not distinguish powers, for the above situation, the RPA robot can recognize that the two attribute values are different, namely, 1.0 × 10⁷CFU/100g is different from 1.0 × 107 CFU/100g, and the attribute value of 1.0 × 107 CFU/100g can be marked in the text content, and whether the error occurs in the text content can be checked manually.

It should be noted that, considering the special case of the product package diagram design of some products, for example, in general, the characters in the product package diagram are arranged from left to right or from top to bottom, but the characters in the product package diagram of some products may be displayed in a surrounding manner, in a wavy line manner, and so on, which will cause the OCR recognition result to be different from the document content.

As an example, OCR recognition is performed on the sub-image 1 in fig. 3, and the recognition result may be as shown in fig. 20. However, for the image in fig. 21, the OCR recognition result may be erroneous. In response to the above situation, the RPA robot may mark the difference in the text content and/or manually check whether there is an error in the position where the difference is marked in the product packaging diagram.

Optionally, the RPA robot may also generate a verification result, which may be downloaded by the user.

Finally, the labeled text content, the labeled document content, the labeled commodity packaging drawing and the check report can be rechecked manually. The commodity information is checked by the checking platform or the RPA robot, so that checking can be completed in a short time, and generally can be completed in 1-3 minutes, thereby not only improving checking efficiency, but also improving accuracy of checking results. The manual work only needs to recheck the difference, so that the workload of related personnel can be reduced, and the working efficiency is improved.

In correspondence with the RPA and AI-based merchandise information processing method provided in the above-described embodiment of fig. 1 to 10, the present disclosure also provides a merchandise information processing device based on RPA and AI, and since the RPA and AI-based merchandise information processing device provided in the embodiment of the present disclosure corresponds to the RPA and AI-based merchandise information processing method provided in the above-described embodiment of fig. 1 to 10, the embodiment of the RPA and AI-based merchandise information processing method provided in the embodiment of the present disclosure is also applicable to the RPA and AI-based merchandise information processing device provided in the embodiment of the present disclosure, and will not be described in detail in the embodiment of the present disclosure.

Fig. 22 is a schematic structural diagram of a commodity information processing apparatus based on RPA and AI according to an embodiment of the present disclosure.

As shown in fig. 22, the product information processing apparatus 2200 based on RPA and AI, applied to the RPA robot, may include: a first obtaining module 2210, a recognition module 2220, a second obtaining module 2230, a comparison module 2240, and a labeling module 2250.

The first obtaining module 2210 is configured to obtain a commodity packaging diagram corresponding to a target commodity.

The recognition module 2220 is configured to recognize text content in the product package diagram based on an Optical Character Recognition (OCR) technique.

The second obtaining module 2230 is configured to obtain the reference document, and obtain document content in the reference document, where the document content includes product information corresponding to the target product.

The comparing module 2240 is configured to compare the text content with the document content to determine a first difference portion different from the document content in the text content.

The labeling module 2250 is configured to perform an exception labeling on the first difference portion in the text content, and/or perform an exception labeling on an area where the first difference portion is located in the product packaging diagram.

In a possible implementation manner of the embodiment of the present disclosure, the comparing module 2240 is configured to: extracting each first attribute field from the text content, and extracting a first attribute value matched with each first attribute field from the text content; comparing the first attribute fields and the first attribute values corresponding to the first attribute fields with the second attribute fields and the second attribute values corresponding to the second attribute fields in the document content; under the condition that a first target attribute field is not matched with a second attribute field in each first attribute field, taking the first attribute value corresponding to the first target attribute field and/or the first target attribute field as a first difference part; and under the condition that a second target attribute field is matched with the second attribute field in each first attribute field, but a first attribute value corresponding to the second target attribute field is not matched with a second attribute value corresponding to the second attribute field, taking the first attribute value corresponding to the second target attribute field as a first difference part.

In a possible implementation manner of the embodiment of the present disclosure, the comparing module 2240 is further configured to: acquiring a set word list, wherein the set word list comprises at least one third attribute field; extracting third attribute values matched with all third attribute fields in the set word list from the text content; comparing the third attribute values corresponding to the third attribute fields with the second attribute values corresponding to the second attribute fields in the document content; and in the case that the target attribute value does not match the second attribute value in the third attribute values, taking the target attribute value as a first difference part.

In a possible implementation manner of the embodiment of the present disclosure, the comparing module 2240 is configured to: extracting first nutritional component information of the target commodity from the text content, and extracting second nutritional component information from the document content; comparing each component information in the first nutrient component information with corresponding component information in the second nutrient component information; in the case where there is a mismatch between the target component information in the first nutritional component information and the corresponding component information in the second nutritional component information, the target component information is regarded as a first difference portion.

In a possible implementation manner of the embodiment of the present disclosure, the text content includes first nutritional component information of the target product, and the RPA and AI-based product information processing apparatus 2200 may further include:

the first processing module is used for extracting first nutrient component information from the text content; aiming at any component information in the first nutritional component information, a regular expression matched with any component information is obtained; matching the regular expression with any component information; and if not, performing replacement processing on any component information based on the regular expression.

the second processing module is used for extracting first nutrient component information from the text content; aiming at any text segment in the first nutrient component information, judging whether the semantics of any text segment is complete; if the semantics of any text segment is incomplete, acquiring an adjacent text segment adjacent to any text segment from the nutrient composition information; if the semanteme of the adjacent text fragment is not complete, determining a sub-fragment with complete semanteme from the adjacent text fragment; extracting other characters except the sub-segments in the adjacent text segments, classifying the other characters into any text segment, and removing the other characters from the adjacent text segments.

In a possible implementation manner of the embodiment of the present disclosure, the first obtaining module 2210 is configured to: acquiring a target document containing a commodity packaging diagram; and extracting the commodity packaging diagram from the target document.

In a possible implementation manner of the embodiment of the present disclosure, the identifying module 2220 is configured to: in response to the intercepting operation, the commodity packing diagram is segmented into at least one sub-image; character recognition is performed on at least one sub-image based on OCR technology to obtain textual content.

In a possible implementation manner of the embodiment of the present disclosure, the identifying module 2220 is configured to: identifying and extracting at least one target area from the commodity packaging diagram based on a target detection algorithm, wherein the target area comprises character information; and performing character recognition on at least one target area based on an OCR technology to obtain text content.

In a possible implementation manner of the embodiment of the present disclosure, the tagging module 2250 is configured to adjust a font and/or a font size of the first difference portion in the text content; and carrying out color labeling on the adjusted first difference part.

In a possible implementation manner of the embodiment of the present disclosure, the comparing module 2240 is further configured to: and comparing the document content with the text content to determine a second different part different from the text content in the document content.

A tagging module 2250 further operable to: and carrying out exception marking on the second difference part in the document content.

The RPA and AI-based product information processing apparatus 2200 may further include:

and the display module is used for displaying the marked document content.

In a possible implementation manner of the embodiment of the present disclosure, the article information processing apparatus 2200 based on RPA and AI may further include:

and the sending module is used for sending prompt information, wherein the prompt information is used for prompting the checking and/or the modification of the first difference part in the commodity packaging diagram.

And/or the presence of a gas in the gas,

and the generating module is used for generating and displaying a checking report, wherein the checking report comprises at least one item of the corresponding relation between the first attribute field and the first attribute value, the corresponding relation between the third attribute field and the third attribute value and the first nutrient component information of the target commodity in the text content.

The commodity information processing device based on the RPA and the AI of the embodiment of the disclosure acquires a commodity packaging image corresponding to a target commodity through the RPA robot, and identifies text contents in the commodity packaging image based on an OCR technology; acquiring a reference document and acquiring document contents in the reference document, wherein the document contents comprise commodity information corresponding to a target commodity; comparing the text content with the document content to determine a first difference part in the text content, which is different from the document content; and carrying out exception marking on the first difference part in the text content, and/or carrying out exception marking on the area where the first difference part is located in the commodity packaging diagram. Therefore, commodity information on the commodity packaging drawing can be automatically checked through the RPA robot, on one hand, the manual participation amount can be reduced, manpower resources are released, and the manpower cost is reduced.

In order to implement the foregoing embodiments, an embodiment of the present disclosure further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the article information processing method based on RPA and AI according to any one of the foregoing method embodiments is implemented.

In order to implement the above embodiments, the present disclosure also proposes a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the RPA and AI-based merchandise information processing method according to any one of the foregoing method embodiments.

In order to implement the foregoing embodiments, the present disclosure further provides a computer program product, which when executed by an instruction processor in the computer program product, implements the RPA and AI-based commodity information processing method according to any one of the foregoing method embodiments.

FIG. 23 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure. The electronic device 12 shown in fig. 23 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in FIG. 23, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. These architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus, to name a few.

Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 23, commonly referred to as a "hard drive"). Although not shown in FIG. 23, a magnetic disk drive for reading from and writing to a removable nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable nonvolatile optical disk (e.g., a Compact disk Read Only Memory (CD-ROM), a Digital versatile disk Read Only Memory (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally perform the functions and/or methodologies of the embodiments described in this disclosure.

Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public Network such as the Internet via the Network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing by executing programs stored in the memory 28, for example, implementing the methods mentioned in the foregoing embodiments.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present disclosure, "a plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present disclosure.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present disclosure have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present disclosure, and that changes, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present disclosure.

Claims

1. A commodity information processing method based on Robot Process Automation (RPA) and Artificial Intelligence (AI), wherein the method is executed by an RPA robot and comprises the following steps:

2. The method of claim 1, wherein comparing the text content with the document content to determine a first difference portion of the text content different from the document content comprises:

extracting each first attribute field from the text content, and extracting a first attribute value matched with each first attribute field from the text content;

comparing the first attribute fields and the first attribute values corresponding to the first attribute fields with the second attribute fields and the second attribute values corresponding to the second attribute fields in the document content;

taking a first attribute value corresponding to a first target attribute field and/or a first target attribute field as the first difference part under the condition that the first target attribute field and the second attribute field are not matched in each first attribute field;

and when a second target attribute field is matched with the second attribute field in each first attribute field, but a first attribute value corresponding to the second target attribute field is not matched with a second attribute value corresponding to the second attribute field, taking the first attribute value corresponding to the second target attribute field as the first difference part.

3. The method of claim 2, further comprising:

acquiring a set word list, wherein the set word list comprises at least one third attribute field;

extracting third attribute values matched with the third attribute fields in the set word list from the text content;

comparing the third attribute value corresponding to each third attribute field with the second attribute value corresponding to each second attribute field in the document content;

and in the case that the target attribute value does not match the second attribute value in each of the third attribute values, taking the target attribute value as the first difference portion.

4. The method of claim 1, wherein comparing the text content with the document content to determine a first difference portion of the text content different from the document content comprises:

extracting first nutritional component information of the target commodity from the text content and extracting second nutritional component information from the document content;

comparing each component information in the first nutrient component information with corresponding component information in the second nutrient component information;

and in the case where there is a mismatch between target ingredient information in the first nutritional ingredient information and corresponding ingredient information in the second nutritional ingredient information, treating the target ingredient information as the first difference portion.

5. The method of claim 1, wherein the text content comprises first nutritional composition information of the target product, and after the identifying the text content in the product packaging map based on the Optical Character Recognition (OCR) technology, the method further comprises:

extracting the first nutritional component information from the text content;

aiming at any component information in the first nutritional component information, a regular expression matched with the any component information is obtained;

matching the regular expression with the any component information;

and if not, performing replacement processing on any component information based on the regular expression.

6. The method of claim 1, wherein the text content comprises first nutritional composition information of the target product, and after the identifying the text content in the product packaging map based on the Optical Character Recognition (OCR) technology, the method further comprises:

extracting the first nutrient content information from the text content;

aiming at any text fragment in the first nutrient component information, judging whether the semantics of the text fragment is complete;

if the semantics of any text segment is incomplete, acquiring an adjacent text segment adjacent to the any text segment from the nutrient composition information;

if the semanteme of the adjacent text fragment is not complete, determining a sub-fragment with complete semanteme from the adjacent text fragment;

extracting other characters except the sub-segments in the adjacent text segments, classifying the other characters into any text segment, and removing the other characters from the adjacent text segments.

7. The method according to any one of claims 1-6, wherein the obtaining of the commodity packing diagram corresponding to the target commodity comprises:

acquiring a target document containing the commodity packaging diagram;

and extracting the commodity packaging drawing from the target document.

8. The method according to any one of claims 1-6, wherein the identifying text content in the package map of the commodity based on an Optical Character Recognition (OCR) technique comprises:

in response to the intercepting operation, the commodity packing diagram is segmented into at least one sub-image;

and performing character recognition on the at least one sub-image based on the OCR technology to obtain the text content.

9. The method according to any one of claims 1-6, wherein the identifying text content in the package map of the commodity based on an Optical Character Recognition (OCR) technique comprises:

identifying and extracting at least one target area from the commodity packaging diagram based on a target detection algorithm, wherein the target area comprises character information;

and performing character recognition on the at least one target area based on the OCR technology to obtain the text content.

10. The method according to any one of claims 1-6, wherein said abnormally labeling the first difference portion in the text content comprises:

in the text content, adjusting the font and/or the font size of the first difference part;

and carrying out color labeling on the adjusted first difference part.

11. The method according to any one of claims 1-6, further comprising:

comparing the document content with the text content to determine a second difference portion different from the text content in the document content;

performing exception marking on the second difference part in the document content;

and displaying the marked document content.

12. The method according to any one of claims 1-6, further comprising:

sending prompt information, wherein the prompt information is used for prompting the checking and/or the modification of the first difference part in the commodity packaging diagram;

and/or the presence of a gas in the gas,

and generating and displaying a checking report, wherein the checking report comprises at least one of the corresponding relation between the first attribute field and the first attribute value, the corresponding relation between the third attribute field and the third attribute value and the first nutrient component information of the target commodity in the text content.

13. A commodity information processing device based on Robot Process Automation (RPA) and Artificial Intelligence (AI), which is applied to an RPA robot and comprises:

14. The apparatus of claim 13, wherein the alignment module is configured to:

15. The apparatus of claim 14, wherein the alignment module is further configured to:

16. The apparatus of claim 13, wherein the alignment module is configured to:

17. The apparatus of claim 13, wherein the text content includes first nutritional component information of the target product, the apparatus further comprising:

the first processing module is used for extracting the first nutrient component information from the text content; aiming at any component information in the first nutritional component information, a regular expression matched with the any component information is obtained; matching the regular expression with the any component information; and if not, performing replacement processing on any component information based on the regular expression.

18. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of claims 1-12 when executing the computer program.

19. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-12.