CN117671985B

CN117671985B - Road identification voice prompt method, device, equipment and medium based on image recognition

Info

Publication number: CN117671985B
Application number: CN202311722177.8A
Authority: CN
Inventors: 林黎明; 林张瑞; 曹靖雯
Original assignee: Hubei Cheanda Information Technology Co ltd
Current assignee: Hubei Cheanda Information Technology Co ltd
Priority date: 2023-12-14
Filing date: 2023-12-14
Publication date: 2024-06-14
Anticipated expiration: 2043-12-14
Also published as: CN117671985A

Abstract

The invention discloses a road identification voice prompt method, device, equipment and medium based on image recognition, and relates to the technical field of safe driving assistance. The method comprises the steps of firstly carrying out real-time processing on road traffic sign image identification on front video data by adopting a target detection algorithm to obtain a sign image identification result, then screening out the most suitable object to be prompted from at least one identified road traffic sign image, filling the relative azimuth information corresponding to the object to be prompted into a preset text prompting template corresponding to the object to be prompted to obtain text prompting, finally synthesizing the text prompting text into a road identification prompting voice signal, and transmitting the road identification prompting voice signal to a voice loudspeaker in a vehicle for voice playing in real time, thereby helping a driver to pay attention to the meaning and the requirement of the road traffic sign in the driving process, enhancing the road consciousness and the safety consciousness of the driver, improving the safety of the road traffic, and being particularly suitable for drivers with novice or drivers with poor memory.

Description

Road identification voice prompt method, device, equipment and medium based on image recognition

Technical Field

The invention belongs to the technical field of safe driving assistance, and particularly relates to a road identification voice prompt method, device, equipment and medium based on image recognition.

Background

Road traffic signs are facilities that transmit specific information to traffic participants using graphic symbols, colors, and words, and are used to manage traffic and secure safety. Road traffic signs are of various types, and according to the regulations of road traffic signs and markings, the following seven categories can be specifically divided: (1) a warning flag: a sign warning vehicles and pedestrians of a dangerous spot; (2) a forbidden flag: a flag to prohibit or limit vehicle or pedestrian traffic behavior; (3) an indicator: a sign indicating travel of a vehicle or pedestrian; (4) a way-directing flag: a sign conveying the road direction, location or distance; (5) tourist area flag: providing a sign of tourist attraction direction or distance; (6) road construction safety sign: a sign for advertising the traffic of the road construction area; (7) auxiliary sign: the main mark is attached to the auxiliary instruction mark.

Road traffic signs play a vital role in road traffic, for example, road traffic signs are a type of alert, warning and ban for the driver, so that the driver needs to be aware of the various road traffic signs and follow strictly during driving. However, the driver can have certain corresponding recognition capability after long-time study, examination and driving practice operation for the recognition of the road traffic sign, so that a novice driver or a driver with poor memory can hardly correctly recognize various different road traffic signs and strictly follow the road traffic sign in the driving process. Therefore, how to provide an automatic prompting scheme capable of helping a driver to pay attention to the meaning and the requirement of a road traffic sign in the driving process so as to enhance the road awareness and the safety awareness of the driver and improve the safety of the road traffic is a subject of urgent study by those skilled in the art.

Disclosure of Invention

The invention aims to provide a road identification voice prompt method, a device, computer equipment and a computer readable storage medium based on image recognition, which are used for solving the problem that a novice driver or a driver with poor memory can hardly correctly recognize various road traffic signs in the driving process at present.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

In a first aspect, a road identification voice prompt method based on image recognition is provided, including:

acquiring front video data acquired in real time by a vehicle-mounted camera, wherein the vehicle-mounted camera is arranged at the front part of a vehicle body and enables a lens to be arranged towards the front;

Carrying out road traffic sign image recognition real-time processing on the front video data by adopting a target detection algorithm to obtain a sign image recognition result, wherein the sign image recognition result comprises at least one road traffic sign image which is recognized;

sequentially arranging the at least one road traffic sign image according to the sequence from the near to the far of the vehicle marking distance to obtain a road traffic sign image sequence, wherein the vehicle marking distance refers to the distance from the corresponding road traffic sign to the front part of the vehicle body;

Traversing each road traffic sign image in the road traffic sign image sequence in sequence from front to back: judging whether a road marking voice prompt action is executed on a currently traversed road traffic sign image in a first unit period, if so, traversing a next road traffic sign image, otherwise, acquiring a current vehicle speed value, estimating and obtaining a predicted time length and a predicted time stamp of a corresponding road traffic sign which correspond to the currently traversed road traffic sign image according to the current vehicle speed value and a marking distance which corresponds to the currently traversed road traffic sign image, then estimating and obtaining a road marking prompt voice play end time stamp which corresponds to the currently traversed road traffic sign image according to a preset text prompt template which corresponds to the currently traversed road traffic sign image, finally judging whether the predicted time stamp is earlier than the road marking prompt voice play end time stamp, traversing the next road traffic sign image, otherwise, taking the currently traversed road traffic sign image as an object to be prompted, and stopping traversing, wherein the predicted time stamp is equal to a predicted time length and a preset text prompt time stamp which correspond to the current road marking image, and the road marking voice prompt time stamp which correspond to the road marking voice prompt content is required to be played;

Filling relative azimuth information corresponding to the to-be-prompted corresponding to a preset text prompting template corresponding to the to-be-prompted corresponding to obtain text prompting text corresponding to the to-be-prompted, wherein the relative azimuth information refers to azimuth information of a corresponding road traffic sign relative to the front part of the vehicle body, and the preset text prompting template comprises meaning and required content of the corresponding road traffic sign;

synthesizing the text prompt text into a road identification prompt voice signal;

And transmitting the road identification prompting voice signal to a voice loudspeaker in the vehicle in real time for voice playing so as to finish the road identification voice prompting action of the object to be prompted.

Based on the above-mentioned invention, a automatic prompting scheme for helping drivers pay attention to the meaning and requirement of road traffic sign is provided based on the target detection algorithm and the voice synthesis technology, namely, the target detection algorithm is firstly adopted to conduct the real-time processing of the road traffic sign image recognition on the front video data to obtain the sign image recognition result, then the most suitable object to be prompted is screened out from at least one recognized road traffic sign image, the relative azimuth information corresponding to the object to be prompted is filled into the preset text prompting template corresponding to the object to be prompted to obtain text prompting text, finally the text prompting text is synthesized into the road identification prompting voice signal and is transmitted to the voice loudspeaker in the vehicle in real time for voice playing, thus helping the drivers pay attention to the meaning and requirement of the road traffic sign in the driving process, enhancing the road consciousness and the safety consciousness of the drivers, improving the safety of the road traffic, being particularly suitable for drivers with novice or drivers with poor memory, and being convenient for practical application and popularization.

In one possible design, the vehicle-mounted camera adopts a depth camera, and the at least one road traffic sign image is sequentially arranged according to the sequence from near to far of the target distance, so as to obtain a road traffic sign image sequence, which comprises the following steps:

Extracting depth information of a corresponding road traffic sign from the front video data of each road traffic sign image in the at least one road traffic sign image, and taking the depth information as a corresponding vehicle marking distance, wherein the vehicle marking distance refers to the distance from the corresponding road traffic sign to the front part of the vehicle body;

and arranging the at least one traffic sign image in sequence according to the sequence from the near distance to the far distance of the target vehicle to obtain a traffic sign image sequence.

In one possible design, the at least one road traffic sign image is sequentially arranged according to the order of the distance from the target vehicle from the near to the far, so as to obtain a road traffic sign image sequence, which includes:

For each road traffic sign image in the at least one road traffic sign image, acquiring corresponding road sign voice prompt action execution times in a second latest unit period, and taking a multiplication result of a corresponding traffic sign distance and the road sign voice prompt action execution times as a corresponding sequencing reference value, wherein the traffic sign distance refers to a distance from a corresponding road traffic sign to the front part of the vehicle body;

and sequentially arranging the at least one road traffic sign image according to the sequence from the small to the large of the sequencing reference values to obtain a road traffic sign image sequence.

In one possible design, the azimuth information includes the target distance and a corresponding road traffic sign in front of the vehicle body, in front of the vehicle body.

In one possible design, the filling the relative azimuth information corresponding to the object to be prompted into a preset text prompting template corresponding to the object to be prompted to obtain text prompting corresponding to the object to be prompted includes:

extracting the video frame image of the object to be prompted from the front video data;

Inputting the video frame image to be subjected to training in advance into a Bezier curve parameter prediction model, and outputting Bezier curve parameters of lane edge lines, wherein the lane edge lines comprise left lane edge lines and right lane edge lines, the Bezier curve parameters comprise coordinate positions of a plurality of Bezier curve key points on a two-dimensional coordinate system of the video frame image to be subjected to training, and the plurality of Bezier curve key points comprise a curve starting point, a curve ending point and at least one curve control point;

Determining a left lane edge curve on the two-dimensional coordinate system according to the Bezier curve parameters of the left lane edge line, and determining a right lane edge curve on the two-dimensional coordinate system according to the Bezier curve parameters of the right lane edge line;

Determining whether a road traffic sign corresponding to the object to be prompted is in front of the left side of the vehicle body, in front of the right side of the vehicle body or in front of the vehicle body according to the position relation between the position of the object to be prompted in the video frame image and the left side lane edge curve and the right side lane edge curve, and taking the determination result and the target distance corresponding to the object to be prompted as relative azimuth information corresponding to the object to be prompted;

And filling the relative azimuth information into a preset text prompt template corresponding to the to-be-prompted object to obtain a text prompt text corresponding to the to-be-prompted object.

In one possible design, after the road identification prompting voice signal is transmitted to the in-vehicle voice horn in real time for voice playing, the method further comprises:

And temporarily stopping executing the road identification voice prompt method during the voice playing period of the road identification prompt voice signal and within a preset period after the voice playing is finished.

The second aspect provides a road identification voice prompt device based on image recognition, which comprises a video data acquisition module, an image recognition processing module, a marker image sequencing module, a marker image traversing module, a prompt text generation module, a prompt voice synthesis module and a voice signal transmission module which are connected in sequence in a communication way;

The video data acquisition module is used for acquiring front video data acquired in real time by the vehicle-mounted camera, wherein the vehicle-mounted camera is arranged at the front part of the vehicle body and enables the lens to be arranged towards the front;

The image recognition processing module is used for carrying out road traffic sign image recognition real-time processing on the front video data by adopting a target detection algorithm to obtain a sign image recognition result, wherein the sign image recognition result comprises at least one recognized road traffic sign image;

The mark image ordering module is used for sequentially arranging the at least one road traffic mark image according to the sequence from the near to the far of the mark distance to obtain a road traffic mark image sequence, wherein the mark distance refers to the distance from the corresponding road traffic mark to the front part of the vehicle body;

The sign image traversing module is used for traversing each road traffic sign image in the road traffic sign image sequence in sequence from front to back: judging whether a road marking voice prompt action is executed on a currently traversed road traffic sign image in a first unit period, if so, traversing a next road traffic sign image, otherwise, acquiring a current vehicle speed value, estimating and obtaining a predicted time length and a predicted time stamp of a corresponding road traffic sign which correspond to the currently traversed road traffic sign image according to the current vehicle speed value and a marking distance which corresponds to the currently traversed road traffic sign image, then estimating and obtaining a road marking prompt voice play end time stamp which corresponds to the currently traversed road traffic sign image according to a preset text prompt template which corresponds to the currently traversed road traffic sign image, finally judging whether the predicted time stamp is earlier than the road marking prompt voice play end time stamp, traversing the next road traffic sign image, otherwise, taking the currently traversed road traffic sign image as an object to be prompted, and stopping traversing, wherein the predicted time stamp is equal to a predicted time length and a preset text prompt time stamp which correspond to the current road marking image, and the road marking voice prompt time stamp which correspond to the road marking voice prompt content is required to be played;

The prompt text generation module is used for filling relative azimuth information corresponding to the to-be-prompted object into a preset text prompt template corresponding to the to-be-prompted object to obtain a text prompt text corresponding to the to-be-prompted object, wherein the relative azimuth information refers to azimuth information of a corresponding road traffic sign relative to the front part of the vehicle body, and the preset text prompt template comprises meanings and required contents of the corresponding road traffic sign;

The prompt voice synthesis module is used for synthesizing the text prompt text into a road identification prompt voice signal;

The voice signal transmission module is used for transmitting the road identification prompting voice signal to a voice loudspeaker in a vehicle in real time for voice playing so as to finish the road identification voice prompting action of the object to be prompted.

In a third aspect, the present invention provides a computer device comprising a memory, a processor and a transceiver in communication connection in turn, wherein the memory is configured to store a computer program, the transceiver is configured to send and receive a message, and the processor is configured to read the computer program and execute the road identification voice prompt method according to the first aspect or any of the possible designs of the first aspect.

In a fourth aspect, the present invention provides a computer readable storage medium having instructions stored thereon which, when executed on a computer, perform the road identification voice prompt method as described in the first aspect or any of the possible designs of the first aspect.

In a fifth aspect, the invention provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the road marking voice prompt method as described in the first aspect or any of the possible designs of the first aspect.

The beneficial effect of above-mentioned scheme:

(1) The invention creatively provides an automatic prompting scheme for helping a driver to pay attention to the meaning and the requirement of a road traffic sign based on a target detection algorithm and a voice synthesis technology, namely, the target detection algorithm is firstly adopted to conduct real-time processing on road traffic sign image recognition on front video data to obtain a sign image recognition result, then the most suitable object to be prompted is screened out from at least one road traffic sign image which is recognized, relative azimuth information corresponding to the object to be prompted is filled into a preset text prompting template corresponding to the object to be prompted to obtain text prompting, finally the text prompting text is synthesized into a road sign prompting voice signal and is transmitted to an in-vehicle voice loudspeaker in real time to be played, so that the meaning and the requirement of the road traffic sign are helped to the driver in the driving process, the road consciousness and the safety consciousness of the driver are enhanced, the safety of the road traffic is improved, the method is particularly suitable for drivers with new hands or drivers with poor memory, and the method is convenient to be practically applied and popularized.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a road identification voice prompt method based on image recognition according to an embodiment of the present application.

Fig. 2 is an exemplary diagram of left and right lane edge curves on a two-dimensional coordinate system according to an embodiment of the present application, where (a) in fig. 2 shows a case in a straight road, and (b) in fig. 2 shows a case in a curved road.

Fig. 3 is a schematic structural diagram of a road identification voice prompt device based on image recognition according to an embodiment of the present application.

Fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the present invention will be briefly described below with reference to the accompanying drawings and the description of the embodiments or the prior art, and it is obvious that the following description of the structure of the drawings is only some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art. It should be noted that the description of these examples is for aiding in understanding the present invention, but is not intended to limit the present invention.

It should be understood that although the terms first and second, etc. may be used herein to describe various objects, these objects should not be limited by these terms. These terms are only used to distinguish one object from another. For example, a first object may be referred to as a second object, and similarly a second object may be referred to as a first object, without departing from the scope of example embodiments of the invention.

It should be understood that for the term "and/or" that may appear herein, it is merely one association relationship that describes an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: three cases of A alone, B alone or both A and B exist; as another example, A, B and/or C may represent the presence of any one of A, B and C or any combination thereof; for the term "/and" that may appear herein, which is descriptive of another associative object relationship, it means that there may be two relationships, e.g., a/and B, it may be expressed that: the two cases of A and B exist independently or simultaneously; in addition, for the character "/" that may appear herein, it is generally indicated that the context associated object is an "or" relationship.

Examples

As shown in fig. 1, the road marking voice prompting method based on image recognition provided in the first aspect of the present embodiment may be, but is not limited to, executed by a computer device having a certain computing resource and respectively connected to a vehicle-mounted camera and an in-vehicle voice speaker in a communication manner, for example, a driving computer (also referred to as a computer control module, english is Electronic Control Unit, abbreviated as ECU), a platform server, a Personal computer (Personal Computer, PC, a multipurpose computer with a size, price and performance suitable for Personal use, a desktop computer, a notebook computer, a small notebook computer, a tablet computer, an ultra-notebook computer, etc. all belong to the Personal computer), a smart phone, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), or an electronic device such as a wearable device. As shown in fig. 1, the road sign voice prompt method may include, but is not limited to, the following steps S1 to S7.

S1, acquiring front video data acquired in real time by a vehicle-mounted camera, wherein the vehicle-mounted camera is arranged in front of a vehicle body and enables a lens to be arranged right in front.

In the step S1, the lens field of the vehicle-mounted camera covers a front area of the front part of the vehicle body, and is used for acquiring video frame images of the front area in real time to obtain front video data comprising a plurality of continuous video frame images; the vehicle-mounted camera can specifically adopt an image acquisition device in the existing automobile data recorder, but is not limited to the vehicle-mounted camera; in order to be able to obtain depth information of the road traffic sign in front of the vehicle in the camera coordinate system, the vehicle camera preferably employs a depth camera, for example a binocular camera. In addition, the vehicle-mounted camera can transmit acquired data to the local equipment in a conventional mode.

S2, carrying out road traffic sign image recognition real-time processing on the front video data by adopting a target detection algorithm to obtain a sign image recognition result, wherein the sign image recognition result comprises at least one road traffic sign image which is recognized.

In the step S2, the target detection algorithm is an existing artificial intelligent recognition algorithm for recognizing the object inside and marking the position of the object in the picture, specifically, but not limited to, a target detection algorithm is proposed in 2015 by using a fast R-CNN (Faster Regions with Convolutional Neural Networks features, by He Kaiming, etc., which obtains a plurality of first target detection algorithms in ILSVRV and COCO competitions in 2015, an SSD (Single Shot MultiBox Detector, a single lens multi-box detector, one of the target detection algorithms proposed by Wei Liu on ECCV 2016, one of the currently popular main detection frameworks), or YOLO (You only look once, currently developed to the V4 version, the basic principle of which is that an input image is firstly divided into 7×7 grids, 2 frames are predicted for each grid, then a target window with a relatively low possibility of removing a threshold value is removed, and finally a redundant window is removed by using a frame merging manner to obtain a detection result) target detection algorithm, etc., so that the traffic sign detection algorithm can be processed based on the target detection algorithm to obtain the road sign recognition result, and the road image recognition result is very widely applied in industry.

In the step S2, specifically, the target detection algorithm is used to perform real-time processing on the road traffic sign image identification on the front video data to obtain a sign image identification result, which includes but is not limited to: and importing the front video frame image in the front video data into a road traffic sign image recognition model which is based on a YOLO v4 target detection algorithm and is trained in advance to obtain a sign image recognition result, wherein the sign image recognition result comprises at least one road traffic sign image which is recognized and an image marking frame of each road traffic sign image in the at least one road traffic sign image. The specific model structure of the YOLO v4 target detection algorithm consists of three parts, namely a backbone network back, a neck network neg and a head network head. The Backbone network Backbone may employ CSPDARKNET (CSP means Cross STAGE PARTIAL) networks for extracting features. The neck network neg consists of SPP (SPATIAL PYRAMID Pooling block) blocks to add receptive fields and isolate the most important features and PANet (Path Aggregation Network) networks to ensure that semantic features are accepted from the high level layers and fine-grained features are accepted from the low level layers of the transverse backbone network at the same time. The head network head is based on anchor boxes and detects three different 13×13, 26×26 and 52×52 feature maps for detecting large to small objects (here, the feature map of large size is included to be more informative, so the feature map of 52×52 size is used to detect small objects, and vice versa). The road traffic sign image recognition model can be trained by a conventional sample training mode, so that after a test image is input, the recognition result of the road traffic sign image, the confidence prediction value of the recognition result and the confidence prediction value and other information can be output. Furthermore, for different road traffic signs (for example warning signs, banning signs, indicating signs, road directing signs, tourist area signs, road construction safety signs and auxiliary signs and their further subdivision road traffic signs), a corresponding said road traffic sign image recognition model can be trained separately for the corresponding road traffic sign image recognition real-time processing: and identifying whether a corresponding road traffic sign image exists in the front video frame image.

S3, sequentially arranging the at least one road traffic sign image according to the sequence from the near to the far of the vehicle marking distance to obtain a road traffic sign image sequence, wherein the vehicle marking distance refers to the distance from the corresponding road traffic sign to the front part of the vehicle body.

In the step S3, since the vehicle-mounted camera is mounted in the front of the vehicle body, the target distance is the distance from the corresponding road traffic sign to the vehicle-mounted camera, so when the vehicle-mounted camera adopts the depth camera, the depth information of the corresponding road traffic sign in the front video data of the vehicle can be used as the target distance, that is, preferably, the vehicle-mounted camera adopts the depth camera, and the at least one road traffic sign image is sequentially arranged according to the sequence of the target distance from the near to the far, so as to obtain a road traffic sign image sequence, including but not limited to: firstly, extracting depth information of a corresponding road traffic sign from the front video data of each road traffic sign image in the at least one road traffic sign image, and taking the depth information as a corresponding vehicle marking distance, wherein the vehicle marking distance refers to the distance from the corresponding road traffic sign to the front part of the vehicle body; and arranging the at least one traffic sign image in sequence according to the sequence from the near distance to the far distance of the target vehicle to obtain a traffic sign image sequence. The specific extraction mode of the depth information is an existing conventional mode.

In the step S3, it is considered that the closer road traffic sign is seen by the driver first and priority compliance is required, so that the at least one traffic sign image needs to be sequentially arranged in the order of the distance from the destination car from the near to the far, so as to give a more priority voice prompt to the closer road traffic sign. In addition, considering that the road traffic sign with lower prompt frequency is more unfamiliar to the driver, the voice prompt should be performed with higher priority, so in order to consider the traffic distance and the prompt frequency of the road traffic sign, it is preferable to sequentially arrange the at least one road traffic sign image in order from the near to the far of the traffic distance, to obtain a road traffic sign image sequence, including but not limited to: for each road traffic sign image in the at least one road traffic sign image, acquiring corresponding road sign voice prompt action execution times in a second latest unit period, and taking a multiplication result of a corresponding traffic sign distance and the road sign voice prompt action execution times as a corresponding sequencing reference value, wherein the traffic sign distance refers to a distance from a corresponding road traffic sign to the front part of the vehicle body; and sequentially arranging the at least one road traffic sign image according to the sequence from the small to the large of the sequencing reference values to obtain a road traffic sign image sequence. The aforementioned last second unit period may be, but is not limited to, for example, the last 24 hours, and the number of times of execution of the voice prompt action for road identification may be calculated by a conventional statistical method.

S4, traversing each road traffic sign image in the road traffic sign image sequence in sequence from front to back: judging whether a road marking voice prompt action is executed on the currently traversed road traffic sign image in the latest first unit period, if so, traversing the next road traffic sign image, otherwise, taking the road traffic sign image as an object to be prompted, and stopping traversing.

In the step S4, the last first unit period may be, but is not limited to, for example, the last half hour, that is, if the voice prompt action for road identification has been performed on a certain road traffic sign image currently traversed in the last half hour, it may be considered that the driver is temporarily familiar with the meaning and requirement of the road traffic sign corresponding to the certain road traffic sign image, so that the voice prompt may not need to be performed again, and the next road traffic sign image may be skipped and traversed. Since only one road traffic sign image can be screened out for voice prompt at a time, the traversal needs to be stopped immediately after the object to be prompted is determined. In addition, considering that a certain period of time is required from the acquisition time stamp of the object to be prompted to the completion of the corresponding road identification voice prompt action, and the vehicle body is moving during the period of time, in order to avoid the problem that the driver has missed the road traffic sign for prompt when the road identification voice prompt action is completed, it is preferable that when it is determined that the road identification voice prompt action is not performed on the currently traversed road traffic sign image within the latest first unit period, the road traffic sign image is taken as the object to be prompted, and the traversal is terminated, including but not limited to the following steps S41 to S44.

S41, acquiring a current vehicle speed value when judging that the road marking voice prompt action is not executed on the currently traversed road traffic marking image in the latest first unit period.

In the step S41, since the vehicle speed is a very important running detection parameter, the current vehicle speed value may be obtained in a conventional manner.

S42, estimating and obtaining the predicted time length and the predicted time stamp of the corresponding road traffic sign, which correspond to the currently traversed road traffic sign image, according to the current vehicle speed value and the marking distance corresponding to the currently traversed road traffic sign image, wherein the predicted time stamp is equal to the image acquisition time stamp corresponding to the currently traversed road traffic sign image plus the predicted time length.

In the step S42, specifically, the predicted time period is equal to the target vehicle distance divided by the current vehicle speed value. For example, if the image acquisition time stamp is 10:00:00, wherein the standard vehicle distance is 100 meters, the current vehicle speed value is 10 meters/second, the predicted time length is 10 seconds, and the predicted time stamp is 10:00:10.

S43, estimating and obtaining a road identification prompt voice playing time length and a road identification prompt voice playing end time stamp corresponding to the currently traversed road traffic sign image according to a preset text prompt template corresponding to the currently traversed road traffic sign image, wherein the road identification prompt voice playing end time stamp is equal to an image acquisition time stamp corresponding to the currently traversed road traffic sign image plus the road identification prompt voice playing time length, and the preset text prompt template comprises meanings and required contents of the corresponding road traffic sign.

In the step S43, although the meaning and the required content of the traffic sign of different roads are greatly different, the information lengths of the traffic sign and the required content are basically fixed (even if the relative azimuth information is added, the information length is not greatly different from the relative azimuth information), so that the playing duration of the voice of the road sign prompt can be conventionally determined based on the information length and the default playing speed. For example, if the image acquisition time stamp is 10:00:00, wherein the playing duration of the road identification prompt voice is 5 seconds, and the time stamp of the end of the road identification prompt voice playing is 10:00:05 (this embodiment temporarily does not consider processing delay). In addition, in order to improve the accuracy of the playing time of the road identification prompt voice, the relative azimuth information corresponding to the currently traversed road traffic sign image is filled into a preset text prompt template corresponding to the road traffic sign image to obtain a text prompt corresponding to the road traffic sign image, and then the playing time of the road identification prompt voice corresponding to the road traffic sign image and the playing end time stamp of the road identification prompt voice are estimated according to the text prompt text, wherein the relative azimuth information refers to the azimuth information of the corresponding road traffic sign relative to the front part of the vehicle body.

S44, judging whether the predicted time stamp is earlier than the road identification prompting voice playing ending time stamp, if so, traversing the next road traffic sign image, otherwise, taking the currently traversed road traffic sign image as an object to be prompted, and stopping traversing.

In the step S44, if it is determined that the predicted time stamp is earlier than the road sign presenting voice playing end time stamp, it is indicated that a problem occurs in that the driver has missed the presenting road traffic sign when the road sign voice presenting operation is completed, and therefore it is necessary to skip and start traversing the next road traffic sign image.

S5, filling relative azimuth information corresponding to the to-be-prompted corresponding to a preset text prompting template corresponding to the to-be-prompted corresponding to obtain text prompting texts corresponding to the to-be-prompted, wherein the relative azimuth information refers to azimuth information of the corresponding road traffic sign relative to the front part of the vehicle body, and the preset text prompting template comprises meaning and required content of the corresponding road traffic sign.

In the step S5, the azimuth information includes, but is not limited to, the target distance, a left front side of the vehicle body, a right front side of the vehicle body, or a right front side of the vehicle body corresponding to the road traffic sign. For example, if the preset text prompt template is "sign indicating that there is a curve in front of the vehicle, please carefully drive-! And the azimuth information comprises a mark distance of 100 meters and a corresponding road traffic sign at the right front of the vehicle body, and the filled text prompt text is 100 meters mark at the right front and indicates that a curve is in front, please carefully drive-! ". The other information than the target distance in the azimuth information may also be automatically acquired, that is, preferably, the relative azimuth information corresponding to the to-be-prompted is filled into a preset text prompting template corresponding to the to-be-prompted, so as to obtain text prompting text corresponding to the to-be-prompted object, which includes but is not limited to the following steps S51 to S55.

S51, extracting the video frame image of the object to be prompted from the front video data.

S52, inputting the video frame image to be subjected to training in advance into a Bezier curve parameter prediction model, and outputting the Bezier curve parameters of lane edge lines, wherein the lane edge lines comprise, but are not limited to, left lane edge lines and right lane edge lines, the Bezier curve parameters comprise, but are not limited to, coordinate positions of a plurality of Bezier curve key points on a two-dimensional coordinate system of the video frame image to be subjected to training, and the plurality of Bezier curve key points comprise, but are not limited to, curve starting points, curve ending points and at least one curve control point.

In the step S52, the bezier curve parameters are used to fit the trend of the lane edge line, i.e. the bezier curve is a fairly important parameter curve in computer graphics, and the general formula is as follows:

In the method, in the process of the invention, Representing the variable,/>Representing Bezier curve,/>Representing the starting point of the curve,/>Represents the curve end point,/>Is a non-zero natural number greater than 1 and represents the order of the Bessel curve,/>Is a natural number, when/>Is not greater than/>When the natural number is nonzero,/>Represents the/>, in the direction from the curve start point to the curve end pointAnd curve control points. As suchWhen the left lane edge curve and the right lane edge curve are equal to 2, the plurality of Bezier curve key points comprise a curve starting point, a curve ending point and a curve control point, so that the left lane edge curve and the right lane edge curve which are obtained subsequently are second-order Bezier curves respectively; while when/>When the curve is equal to 3, the plurality of Bezier curve key points comprise a curve starting point, a curve ending point and two curve control points, so that the left side lane edge curve and the right side lane edge curve which are obtained later are respectively three-order Bezier curves (the directions of the lane edge line and the lane line can be more accurately fitted relative to the two-order Bezier curves). Considering that the higher the order is, the higher the trend fitting accuracy is, the more the required processing time is required for prediction based on the bezier curve parameter prediction model, and in order to quickly obtain the bezier curve parameters of the lane edge line, it is preferable that the multiple bezier curve key points include a curve start point, a curve end point and two curve control points (i.e., the left-side lane edge curve and the right-side lane edge curve obtained in the following are three-order bezier curves respectively). The video frame image (which necessarily includes a lane image) may also be subjected to preprocessing such as image graying processing for reducing the data amount and improving the real-time performance of subsequent detection, and image filtering processing for reducing the interference of uncertain noise points in the road image, before the video frame image (which necessarily includes a lane image) is input to the bezier curve parameter prediction model. In addition, the bezier curve parameter prediction model is an existing model, and the specific network structure and the training process thereof are not described in detail.

S53, determining a left lane edge curve on the two-dimensional coordinate system according to the Bezier curve parameters of the left lane edge line, and determining a right lane edge curve on the two-dimensional coordinate system according to the Bezier curve parameters of the right lane edge line.

In said step S53, the method may be specifically based on the bezier curve general formula (at this timeA value of 1) determines the left lane edge curve and the right lane edge curve, respectively, as shown in fig. 2.

S54, determining whether a road traffic sign corresponding to the object to be prompted is in front of the left side of the vehicle body, in front of the right side of the vehicle body or in front of the vehicle body according to the position relation between the position of the object to be prompted in the video frame image and the left side lane edge curve and the right side lane edge curve, and taking the determination result and the target distance corresponding to the object to be prompted as relative azimuth information corresponding to the object to be prompted.

In the step S54, according to the positional relationship between the position of the object to be prompted in the video frame image and the left lane edge curve and the right lane edge curve, it is determined that the road traffic sign corresponding to the object to be prompted is in front of the front part of the vehicle body, in front of the right side of the vehicle body or in front of the vehicle body, which may be, but not limited to, the following specific steps: (1) If the position is right above or left above the left lane edge curve, determining that the road traffic sign corresponding to the object to be prompted is at the left front of the vehicle body at the front of the vehicle body; (2) If the position is right above or right above the right lane edge curve, determining that the road traffic sign corresponding to the object to be prompted is right in front of the vehicle body; (3) If the position is located right above the left lane edge curve and the right lane edge curve, it may be determined that the road traffic sign corresponding to the object to be prompted is right in front of the vehicle body.

S55, filling the relative azimuth information into a preset text prompt template corresponding to the to-be-prompted object to obtain a text prompt text corresponding to the to-be-prompted object.

S6, synthesizing the text prompt text into a road identification prompt voice signal.

In the step S6, the specific speech signal synthesis mode is an existing conventional mode.

S7, transmitting the road identification prompting voice signal to a voice loudspeaker in the vehicle in real time for voice playing so as to finish the road identification voice prompting action of the object to be prompted.

After the step S7, considering that frequent execution of the road identification voice prompt action may have an interfering effect on the driving behavior of the driver, preferably, after the road identification voice prompt signal is transmitted to the in-vehicle voice speaker in real time for voice playing, the method further includes, but is not limited to: and temporarily stopping executing the road identification voice prompt method (namely stopping executing the steps S1-S6) during the voice playing period of the road identification prompt voice signal and within a preset period after the voice playing period is ended. The preset period may be, for example, 10 minutes, that is, the voice prompt for road marking is performed at least every 10 minutes.

The automatic prompting scheme is provided based on the target detection algorithm and the voice synthesis technology to help the driver to pay attention to the meaning and the requirement of the road traffic sign, namely, the target detection algorithm is firstly adopted to conduct the real-time processing of the road traffic sign image recognition on the front video data to obtain the sign image recognition result, then the most suitable object to be prompted is screened out from the at least one recognized road traffic sign image, the relative azimuth information corresponding to the object to be prompted is filled into the preset text prompting template corresponding to the object to be prompted to obtain the text prompting text, finally the text prompting text is synthesized into the road identification prompting voice signal and is transmitted to the voice loudspeaker in the vehicle in real time to be played, so that the driver can be helped to pay attention to the meaning and the requirement of the road traffic sign in the driving process, the road consciousness and the safety consciousness of the driver are enhanced, the safety of the road traffic is improved, and the automatic prompting method is particularly suitable for drivers with new hands or drivers with poor memory, and the automatic prompting method is convenient to apply and popularize.

As shown in fig. 3, a second aspect of the present embodiment provides a virtual device for implementing the road sign voice prompting method according to the first aspect, which includes a video data acquisition module, an image recognition processing module, a sign image ordering module, a sign image traversing module, a prompt text generating module, a prompt voice synthesizing module and a voice signal transmitting module that are sequentially connected in a communication manner;

The working process, working details and technical effects of the foregoing device provided in the second aspect of the present embodiment may refer to the road sign voice prompting method described in the first aspect, which are not described herein again.

As shown in fig. 4, a third aspect of the present embodiment provides a computer device for executing the road identification voice prompting method according to the first aspect, which includes a memory, a processor and a transceiver that are sequentially communicatively connected, where the memory is configured to store a computer program, the transceiver is configured to send and receive a message, and the processor is configured to read the computer program and execute the road identification voice prompting method according to the first aspect. By way of specific example, the Memory may include, but is not limited to, random-Access Memory (RAM), read-Only Memory (ROM), flash Memory (Flash Memory), first-in first-out Memory (First Input First Output, FIFO), and/or first-out Memory (First Input Last Output, FILO), etc.; the processor may be, but is not limited to, a microprocessor of the type STM32F105 family. In addition, the computer device may include, but is not limited to, a power module, a display screen, and other necessary components.

The working process, working details and technical effects of the foregoing computer device provided in the third aspect of the present embodiment may refer to the road identification voice prompting method described in the first aspect, which are not described herein again.

A fourth aspect of the present embodiment provides a computer-readable storage medium storing instructions comprising the road identification voice prompt method according to the first aspect, i.e. the computer-readable storage medium has instructions stored thereon which, when executed on a computer, perform the road identification voice prompt method according to the first aspect. The computer readable storage medium refers to a carrier for storing data, and may include, but is not limited to, a floppy disk, an optical disk, a hard disk, a flash Memory, and/or a Memory Stick (Memory Stick), where the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.

The working process, working details and technical effects of the foregoing computer readable storage medium provided in the fourth aspect of the present embodiment may refer to the road identification voice prompting method as described in the first aspect, which are not described herein.

A fifth aspect of the present embodiment provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the road identification voice prompt method of the first aspect. Wherein the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus.

Finally, it should be noted that: the foregoing description is only of the preferred embodiments of the invention and is not intended to limit the scope of the invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The road identification voice prompt method based on image recognition is characterized by comprising the following steps of:

2. The voice prompt method for road sign according to claim 1, wherein the vehicle-mounted camera adopts a depth camera, and the at least one road traffic sign image is sequentially arranged according to the order of the distance from the near to the far of the target vehicle, so as to obtain a road traffic sign image sequence, which comprises:

3. The voice prompt method for road sign according to claim 1, wherein the at least one road traffic sign image is sequentially arranged in order of distance from the target vehicle from near to far to obtain a road traffic sign image sequence, comprising:

4. The voice prompt method for road sign according to claim 1, wherein the azimuth information includes the target distance and a corresponding road traffic sign in front of the vehicle body in front of the left side of the vehicle body, in front of the right side of the vehicle body or in front of the vehicle body.

5. The method of claim 1, wherein the step of filling the relative azimuth information corresponding to the to-be-prompted object into a preset text prompt template corresponding to the to-be-prompted object to obtain text of the text prompt corresponding to the to-be-prompted object includes:

6. The voice-over-road-sign method of claim 1, wherein after transmitting the voice-over-road-sign signal to an in-car voice speaker for voice playing, the method further comprises:

7. The road identification voice prompt device based on image recognition is characterized by comprising a video data acquisition module, an image recognition processing module, a marker image ordering module, a marker image traversing module, a prompt text generation module, a prompt voice synthesis module and a voice signal transmission module which are connected in sequence in a communication mode;

8. A computer device comprising a memory, a processor and a transceiver in communication connection in sequence, wherein the memory is configured to store a computer program, the transceiver is configured to send and receive messages, and the processor is configured to read the computer program and perform the road identification voice prompt method according to any one of claims 1-6.

9. A computer readable storage medium having instructions stored thereon which, when executed on a computer, perform the road marking voice prompt method of any one of claims 1 to 6.