CN107832765A

CN107832765A - Picture recognition to including word content and picture material

Info

Publication number: CN107832765A
Application number: CN201710823997.4A
Authority: CN
Inventors: 邓玥琳; 高光明; 刘辉; 丁飞
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2017-09-13
Filing date: 2017-09-13
Publication date: 2018-03-23

Abstract

It is an object of the invention to provide a kind of image identification method, device and computer equipment, computer-readable recording medium and computer program product.Character area and image-region in picture recognition device positioning picture；Corresponding term vector is extracted to the word content in the character area；To described image extracted region images characteristic vector；The term vector and described image characteristic vector are integrated, to determine the semanteme of the picture.Compared with prior art, the invention provides a kind of scheme that picture is identified automatically by computer equipment, so as to carry out content recognition and examination ＆ verification before issue to the picture including word content and picture material.

Description

Picture recognition to including word content and picture material

Technical field

The present invention relates to picture recognition technical field, and in particular to a kind of picture to including word content and picture material Carry out the technology of semantics recognition.

Background technology

Current picture recognition technology is higher for the accuracy rate of the relatively simple picture of content, so as to which examination ＆ verification is completed automatically Preferably.Or for example, it is word, trade mark for content, or being the picture of single theme, picture examination system can directly lead to OCR technique is crossed to identify the word or trade mark in picture, or by based on the Image Classfication Technology of deep learning in picture Element be identified, to judge whether it meets promulgated standard.

However, for the picture including word content and picture material simultaneously, existing picture recognition technology is more difficult to be applicable, Carried out more by manual examination and verification, this make it that the pressure of auditor is larger, and Review Cycle is also longer.

The content of the invention

It is an object of the invention to provide a kind of image identification method, device and computer equipment, computer-readable storage Medium and computer program product.

According to an aspect of the invention, there is provided a kind of image identification method, wherein, this method comprises the following steps：

Character area and image-region in-positioning picture；

- corresponding term vector is extracted to the word content in the character area；

- to described image extracted region images characteristic vector；

- integrated the term vector and described image characteristic vector, to determine the semanteme of the picture.

According to an aspect of the present invention, a kind of picture recognition device is additionally provided, wherein, the device includes：

For positioning the device of character area and image-region in picture；

For extracting the device of corresponding term vector to the word content in the character area；

For the device to described image extracted region images characteristic vector；

For the term vector and described image characteristic vector to be integrated, to determine the semantic dress of the picture Put.

According to an aspect of the present invention, additionally provide a kind of computer equipment, including memory, processor and be stored in On memory and the computer program that can run on a processor, wherein, it is real during computer program described in the computing device A kind of now image identification method according to an aspect of the present invention.

According to an aspect of the present invention, a kind of computer-readable recording medium is additionally provided, is stored thereon with computer Program, wherein, a kind of picture recognition according to an aspect of the present invention is realized when the computer program is executed by processor Method.

According to an aspect of the present invention, a kind of computer program product is additionally provided, when the computer program product A kind of image identification method according to an aspect of the present invention is realized when being performed by computer equipment.

Compared with prior art, the invention provides a kind of scheme that picture is identified automatically by computer equipment, So as to carry out content recognition and examination ＆ verification before issue to the picture including word content and picture material.Specifically, the present invention is logical Cross and more fine-grained segmentation is carried out to the picture including word content and picture material, be accurately positioned character area and figure therein As region, and then the term vector and image feature vector that are extracted to each region are integrated, to identify the semanteme of picture.Knowing Do not go out after the semanteme of picture, the present invention can also determine whether it meets promulgated standard, to carry out issue examination ＆ verification, example Such as judge whether picture to be released has vulgar content, when with vulgar content, then do not meet promulgated standard, belong to high wind Dangerous picture.

The present invention can be efficiently applied to the issue examination ＆ verification of advertisement.Advertisement figure for including word content and picture material Piece, picture recognition system of the invention can be identified and issue risk identification to it, to accelerate the issue of advertising pictures speed Degree, while ensure that excessive risk picture is identified and filter without by improper issue, this also improves ad distribution user and net The experience of network user.

Brief description of the drawings

By reading the detailed description made to non-limiting example made with reference to the following drawings, of the invention is other Feature, objects and advantages will become more apparent upon：

Fig. 1 shows to be suitable to the block diagram for being used for realizing the exemplary computer system/server 12 of embodiment of the present invention；

Fig. 2 shows according to an embodiment of the invention a kind of to know the picture including word content and picture material Method for distinguishing flow chart；

Fig. 3 shows the schematic diagram for including word content and the picture of picture material according to an example of the present invention；

Fig. 4 shows according to an embodiment of the invention a kind of to know the picture including word content and picture material The schematic diagram of other device.

Same or analogous reference represents same or analogous part in accompanying drawing.

Embodiment

It should be mentioned that some exemplary embodiments are described as before exemplary embodiment is discussed in greater detail The processing described as flow chart or method.Although operations are described as the processing of order by flow chart, therein to be permitted Multioperation can be implemented concurrently, concomitantly or simultaneously.In addition, the order of operations can be rearranged.When it The processing can be terminated when operation is completed, it is also possible to the additional step being not included in accompanying drawing.The processing It can correspond to method, function, code, subroutine, subprogram etc..

Alleged within a context " computer equipment ", also referred to as " computer ", referring to can be by running preset program or referring to Order performs the intelligent electronic device of the predetermined process process such as numerical computations and/or logical calculated, its can include processor with Memory, the programmed instruction to be prestored in memory by computing device perform predetermined process process, or by ASIC, The hardware such as FPGA, DSP perform predetermined process process, or are realized by said two devices combination.Computer equipment includes but unlimited In server, personal computer (PC), notebook computer, tablet personal computer, smart mobile phone etc..

The computer equipment is for example including user equipment and the network equipment.Wherein, the user equipment includes but unlimited In personal computer (PC), notebook computer, mobile terminal etc., the mobile terminal includes but is not limited to smart mobile phone, PDA Deng；The network equipment includes but is not limited to single network server, the server group of multiple webservers composition or is based on The cloud being made up of a large amount of computers or the webserver of cloud computing (Cloud Computing), wherein, cloud computing is distributed One kind of calculating, a super virtual computer being made up of the computer collection of a group loose couplings.Wherein, the computer is set It is standby can isolated operation realize the present invention, also can access network and pass through the interactive operation with other computer equipments in network To realize the present invention.Wherein, the network residing for the computer equipment includes but is not limited to internet, wide area network, Metropolitan Area Network (MAN), office Domain net, VPN etc..

It should be noted that the user equipment, the network equipment and network etc. are only for example, other are existing or from now on may be used The computer equipment or network that can occur such as are applicable to the present invention, should also be included within the scope of the present invention, and to draw It is incorporated herein with mode.

The method (some of them illustrated by flow) discussed herein below can by hardware, software, firmware, in Between part, microcode, hardware description language or its any combination implement.When with software, firmware, middleware or microcode come real Shi Shi, to implement the program code of necessary task or code segment can be stored in machine or computer-readable medium (such as Storage medium) in.(one or more) processor can implement necessary task.

Concrete structure and function detail disclosed herein are only representational, and are for describing showing for the present invention The purpose of example property embodiment.But the present invention can be implemented by many alternative forms, and it is not interpreted as It is limited only by the embodiments set forth herein.

Although it should be appreciated that may have been used term " first ", " second " etc. herein to describe unit, But these units should not be limited by these terms.It is used for the purpose of using these terms by a unit and another unit Make a distinction.For example, in the case of the scope without departing substantially from exemplary embodiment, it is single that first module can be referred to as second Member, and similarly second unit can be referred to as first module.Term "and/or" used herein above include one of them or Any and all combination of more listed associated items.

Term used herein above is not intended to limit exemplary embodiment just for the sake of description specific embodiment.Unless Context clearly refers else, otherwise singulative used herein above "one", " one " also attempt to include plural number.Should also When understanding, term " comprising " and/or "comprising" used herein above provide stated feature, integer, step, operation, The presence of unit and/or component, and do not preclude the presence or addition of other one or more features, integer, step, operation, unit, Component and/or its combination.

It should further be mentioned that in some replaces realization modes, the function/action being previously mentioned can be according to different from attached The order indicated in figure occurs.For example, depending on involved function/action, the two width figures shown in succession actually may be used Substantially simultaneously to perform or can perform in a reverse order sometimes.

The present invention is described in further detail below in conjunction with the accompanying drawings.

Fig. 1 shows the block diagram suitable for being used for the exemplary computer system/server 12 for realizing embodiment of the present invention. The computer system/server 12 that Fig. 1 is shown is only an example, should not be to the function and use range of the embodiment of the present invention Bring any restrictions.

As shown in figure 1, computer system/server 12 is showed in the form of universal computing device.Computer system/service The component of device 12 can include but is not limited to：One or more processor or processing unit 16, system storage 28, connection The bus 18 of different system component (including system storage 28 and processing unit 16).

Bus 18 represents the one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.Lift For example, these architectures include but is not limited to industry standard architecture (ISA) bus, MCA (MAC) Bus, enhanced isa bus, VESA's (VESA) local bus and periphery component interconnection (PCI) bus.

Computer system/server 12 typically comprises various computing systems computer-readable recording medium.These media can be appointed What usable medium that can be accessed by computer system/server 12, including volatibility and non-volatile media, it is moveable and Immovable medium.

Memory 28 can include the computer system readable media of form of volatile memory, such as random access memory Device (RAM) 30 and/or cache memory 32.Computer system/server 12 may further include it is other it is removable/no Movably, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing Immovable, non-volatile magnetic media (Fig. 1 is not shown, commonly referred to as " hard disk drive ").Although not shown in Fig. 1, can To provide the disc driver being used for may move non-volatile magnetic disk (such as " floppy disk ") read-write, and to removable non-volatile Property CD (such as CD-ROM, DVD-ROM or other optical mediums) read-write CD drive.In these cases, it is each to drive Dynamic device can be connected by one or more data media interfaces with bus 18.Memory 28 can include at least one program Product, the program product have one group of (for example, at least one) program module, and these program modules are configured to perform the present invention The function of each embodiment.

Program/utility 40 with one group of (at least one) program module 42, such as memory 28 can be stored in In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs Module and routine data, the realization of network environment may be included in each or certain combination in these examples.Program mould Block 42 generally performs function and/or method in embodiment described in the invention.

Computer system/server 12 can also be (such as keyboard, sensing equipment, aobvious with one or more external equipments 14 Show device 24 etc.) communication, it can also enable a user to lead to the equipment that the computer system/server 12 interacts with one or more Letter, and/or any set with make it that the computer system/server 12 communicated with one or more of the other computing device Standby (such as network interface card, modem etc.) communicates.This communication can be carried out by input/output (I/O) interface 22.And And computer system/server 12 can also pass through network adapter 20 and one or more network (such as LAN (LAN), wide area network (WAN) and/or public network, such as internet) communication.As illustrated, network adapter 20 passes through bus 18 communicate with other modules of computer system/server 12.It should be understood that although not shown in Fig. 1, computer can be combined Systems/servers 12 use other hardware and/or software module, include but is not limited to：Microcode, device driver, at redundancy Manage unit, external disk drive array, RAID system, tape drive and data backup storage system etc..

Processing unit 16 is stored in the program in memory 28 by operation, so as to perform various function application and data Processing.

For example, the various functions for performing the present invention and the computer program of processing, processing are stored with memory 28 When unit 16 performs corresponding computer program, the present invention is realized to the identification including word content and the picture of picture material.

The present invention described in detail below realizes the specific work(to the identification including word content and the picture of picture material Energy/step.

Fig. 2 is shown according to one embodiment of present invention, wherein specifically illustrating one kind to including in word content and image The method flow diagram that the picture of appearance is identified.

The recognition methods is performed by picture recognition system.Picture recognition system typically lies in network side, such as arranges In one or more server.

As shown in Fig. 2 in step sl, character area and image-region in picture recognition system positioning picture；In step In rapid S2, picture recognition system extracts corresponding term vector to the word content in the character area；In step s3, picture Identifying system is to described image extracted region images characteristic vector；In step s 4, picture recognition system by the term vector with Described image characteristic vector is integrated, to determine the semanteme of the picture.

Specifically, in step sl, the character area and image-region in picture recognition system positioning picture.

As shown in figure 3, a picture to be released includes word content and picture material.The picture recognition system of the present invention It is intended to realize automatic identification to such picture including word content and picture material.

Picture recognition system can distinguish character area and figure therein to the picture including word content and picture material As region, and character area and image-region frame are elected respectively by candidate frame as shown in Figure 3.Wherein, character area Shown by solid box, image-region is shown by dotted line frame.

Here, the identification to the character area in picture and image-region can be by the good target detection model of training in advance Carry out.The typically for example various target detection models based on deep learning of target detection model, such as Faster-rcnn moulds Type, yolo models.

For example, collecting the picture through being labeled with character area and image-region in advance, and the picture input through mark is treated The target detection model of training, to carry out model training, so as to obtain the target detection model trained.

Specifically, will be simultaneously comprising word and other mesh by taking the classical model Faster-rcnn models of target detection as an example The picture of mark (as built) is trained after carrying out candidate frame mark and classification annotation, passes through RPN networks (Region Proposal Networks) extraction candidate frame, convolutional layer last layer by roipooling layers by feature in candidate frame A unified size is normalized to, then passes through the loss training networks classified and returned, realization pair respectively with full articulamentum In the position of candidate frame and classification (being that the candidate frame including word is still included such as the candidate frame of other targets of building) Prediction.

In step s 2, picture recognition system extracts corresponding term vector to the word content in character area.

Here, picture recognition system carries out character recognition to the character area in picture, to obtain identified character, enter And the character to being identified segments, to extract term vector therein.

Wherein, picture recognition system can use various existing character recognition technologies to carry out the character in character area Identification.Typically character recognition technologies such as OCR (Optical Character Recognition, optical character identification) skill Art.

After character is extracted, picture recognition system also needs further to extract term vector therein, such as passes through natural language The conventional word2vec methods of process field are sayed, extract corresponding participle, the character/word identified is as the defeated of term vector Enter.

In step s3, picture recognition system extracts image feature vector to image-region.

Carried here, picture recognition system can perform image feature vector by various image classification models to image-region Take.

For example, any CNN (Convolutional such as AlexNet, VGG or ResNet can be used in picture recognition system Neural Networks, convolutional neural networks) model realization image feature vector extraction, and can take last or in Between some full articulamentum data as image feature vector.

In step s 4, picture recognition system is integrated the term vector extracted with image feature vector, to determine The semanteme of picture.

Here, picture recognition system by semantics recognition model to the term vector extracted in step S2 with being extracted in step S3 Image feature vector integrated, to identify the semanteme of picture.

For example, DNN (Deep Neural Network, deep neural network), RNN can be used in picture recognition system (Recurrent Neural Network, recurrent neural network) or LSTM (Long Short-Term Memory, shot and long term note Recall unit) model etc., by the way that term vector and image feature vector directly such as are spliced into a characteristic vector, and then to this feature Vector carries out realizing the semantics recognition to the characteristic vector after the integration by Softmax graders.

Here, it should be noted that those skilled in the art will be understood that no matter enter herein using which kind of foregoing model Row vector is integrated and semantics recognition, extracts the selection of both vector field homoemorphism types before not influenceing respectively.

Preferably, picture recognition system can also further add audit function, so as to which picture recognition system upgrade is figure Piece auditing system.

Specifically, picture examination system judges whether the picture meets promulgated standard according to the semanteme of picture.

Here, the picture semantic that picture examination system identifies according to semantics recognition model, judges whether the picture accords with Promulgated standard is closed, such as is related to the pictures of the illegal contents such as vulgar, violence, reaction and does not meet promulgated standard, belongs to excessive risk Picture, then it can not pass through examination ＆ verification.

Fig. 4 is shown according to one embodiment of present invention, wherein specifically illustrating one kind to including in word content and image The schematic diagram for the device that the picture of appearance is identified.

The identification device can typically be considered as a picture recognition system, and the picture recognition system typically lies in net Network side, such as it is arranged in one or more server.

Carried as shown in figure 4, picture recognition system includes regional positioning device 41, term vector extraction element 42, characteristics of image Take device 43 and semantic recognition device 44.

Wherein, regional positioning device 41 positions the character area and image-region in picture；Term vector extraction element 42 is right Word content in the character area extracts corresponding term vector；Image characteristics extraction device 43 is to described image extracted region Image feature vector；Semantic recognition device 44 is integrated the term vector and described image characteristic vector, with described in determination The semanteme of picture.

Specifically, regional positioning device 41 positions the character area and image-region in picture.

Regional positioning device 41 picture including word content and picture material can be distinguished character area therein and Image-region, and respectively elected character area and image-region frame by candidate frame as shown in Figure 3.Wherein, literal field Domain is shown that image-region is shown by dotted line frame by solid box.

Here, regional positioning device 41 can call various target detection models to identify the character area and figure in picture As region.Identification to the character area in picture and image-region can be carried out by the good target detection model of training in advance. The typically for example various target detection models based on deep learning of target detection model, such as Faster-rcnn models, yolo Model.

Term vector extraction element 42 extracts corresponding term vector to the word content in character area.

Here, term vector extraction element 42 carries out character recognition to the character area in picture, to obtain identified word Symbol, and then the character to being identified segments, to extract term vector therein.

Wherein, term vector extraction element 42 can use various existing character recognition technologies to the character in character area It is identified.Typically character recognition technologies such as OCR (know by Optical Character Recognition, optical character Not) technology.

After character is extracted, term vector extraction element 42 also needs further to extract term vector therein, such as by certainly The conventional word2vec methods in right Language Processing field, extract corresponding participle, the character/word identified is as term vector Input.

Image characteristics extraction device 43 extracts image feature vector to image-region.

Here, image characteristics extraction device 43 can perform characteristics of image by various image classification models to image-region Vector extraction.

For example, any CNN such as AlexNet, VGG or ResNet can be used in image characteristics extraction device 43 The extraction of (Convolutional Neural Networks, convolutional neural networks) model realization image feature vector, and can be with The data of last or some middle full articulamentum are taken as image feature vector.

Semantic recognition device 44 is integrated the term vector extracted with image feature vector, to determine the language of picture Justice.

Here, term vector and figure that semantic recognition device 44 is extracted by semantics recognition model to term vector extraction element 42 As feature deriving means 43 extract image feature vector integrated, to identify the semanteme of picture.

For example, DNN (Deep Neural Network, deep neural network), RNN can be used in semantic recognition device 44 (Recurrent Neural Network, recurrent neural network) or LSTM (Long Short-Term Memory, shot and long term note Recall unit) model etc., by the way that term vector and image feature vector directly such as are spliced into a characteristic vector, and then to this feature Vector carries out realizing the semantics recognition to the characteristic vector after the integration by Softmax graders.

Preferably, picture recognition system can also further add audit function and (be performed by examination ＆ verification device, Fig. 4 does not show Go out), so as to which picture recognition system upgrade is picture examination system.

Specifically, picture examination system (examination ＆ verification device) judges whether the picture meets issue mark according to the semanteme of picture It is accurate.

Here, the picture semantic that examination ＆ verification device identifies according to semantics recognition model, judges whether the picture meets hair Cloth standard, such as be related to the pictures of the illegal contents such as vulgar, violence, reaction and do not meet promulgated standard, belong to excessive risk picture, Examination ＆ verification can not then be passed through.

The present invention can use any combination of one or more computer-readable media.Computer-readable medium can be with It is computer-readable signal media or computer-readable recording medium.Computer-readable recording medium for example can be --- but Be not limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor, or it is any more than combination. The more specifically example (non exhaustive list) of computer-readable recording medium includes：With being electrically connected for one or more wires Connect, portable computer diskette, hard disk, random access memory (RAM), read-only storage (ROM), erasable type may be programmed it is read-only Memory (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory Part or above-mentioned any appropriate combination.In this document, computer-readable recording medium can any be included or store The tangible medium of program, the program can be commanded the either device use or in connection of execution system, device.

Computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium beyond computer-readable recording medium, the computer-readable medium can send, propagate or Transmit for by instruction execution system, device either device use or program in connection.

The program code included on computer-readable medium can be transmitted with any appropriate medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc., or above-mentioned any appropriate combination.

It can be write with one or more programming languages or its combination for performing the computer that operates of the present invention Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Also include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with Fully perform, partly perform on the user computer on the user computer, the software kit independent as one performs, portion Divide and partly perform or performed completely on remote computer or server on the remote computer on the user computer. Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including LAN (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as carried using Internet service Pass through Internet connection for business).

It should be noted that the present invention can be carried out in the assembly of software and/or software and hardware, for example, this hair Bright each device can using application specific integrated circuit (ASIC) or any other realized similar to hardware device.It is in addition, of the invention Some steps or function can employ hardware to realize, for example, coordinating as with processor so as to performing each step or function Circuit.

It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power Profit requires rather than described above limits, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the present invention.The multiple units or device stated in system claims can also be led to by a unit or device Software or hardware are crossed to realize.

Claims

1. a kind of picture examination method, wherein, this method comprises the following steps：

Character area and image-region in-positioning picture；

- to described image extracted region images characteristic vector；

2. according to the method for claim 1, wherein, the positioning step specifically includes：

- pass through the character area and image-region in target detection Model Identification picture.

3. according to the method for claim 2, wherein, the target detection model is trained by following steps and obtained：

- collect the picture through being labeled with character area and image-region；

- picture through mark is inputted to target detection model to be trained, to carry out model training, so as to be trained The target detection model.

4. according to the method in any one of claims 1 to 3, wherein, the extraction step of the term vector specifically includes：

- character recognition is carried out to the character area, to obtain identified character；

- term vector is extracted from the character identified.

5. method according to any one of claim 1 to 4, wherein, the extraction step of described image characteristic vector is specific Including：

- by image classification model to described image extracted region described image characteristic vector.

6. method according to any one of claim 1 to 5, wherein, the integration step specifically includes：

- term vector and described image characteristic vector are integrated by semantics recognition model, to identify the picture It is semantic.

7. method according to any one of claim 1 to 6, wherein, this method is further comprising the steps of：

- according to the semanteme, judge whether the picture meets promulgated standard.

8. a kind of picture examination device, wherein, the device includes：

For positioning the device of character area and image-region in picture；

For the term vector and described image characteristic vector to be integrated, to determine the semantic device of the picture.

9. a kind of computer equipment, including memory, processor and storage are on a memory and the meter that can run on a processor Calculation machine program, wherein, realized described in the computing device during computer program as any one of claim 1 to 7 Method.

10. a kind of computer-readable recording medium, is stored thereon with computer program, wherein, the computer program is processed The method as any one of claim 1 to 7 is realized when device performs.

11. a kind of computer program product, realize that right such as will when the computer program product is performed by computer equipment Seek the method any one of 1 to 7.