{"id":487,"date":"2020-11-12T14:46:03","date_gmt":"2020-11-12T03:46:03","guid":{"rendered":"https:\/\/www.samontab.com\/web\/?p=487"},"modified":"2021-03-03T09:31:30","modified_gmt":"2021-03-02T22:31:30","slug":"align-text-images-with-opencv","status":"publish","type":"post","link":"https:\/\/www.samontab.com\/web\/2020\/11\/align-text-images-with-opencv\/","title":{"rendered":"Align text images with OpenCV"},"content":{"rendered":"\n<p>Most optical character recognition(<a rel=\"noreferrer noopener\" href=\"https:\/\/en.wikipedia.org\/wiki\/Optical_character_recognition\" target=\"_blank\">OCR<\/a>) software first aligns the image properly before detecting the text in it. Here I&#8217;ll show you how to do that with OpenCV:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/aligned_comparison.png\" alt=\"\" class=\"wp-image-499\" width=\"500\" height=\"357\" srcset=\"https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/aligned_comparison.png 2280w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/aligned_comparison-300x214.png 300w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/aligned_comparison-1024x732.png 1024w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/aligned_comparison-768x549.png 768w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/aligned_comparison-1536x1097.png 1536w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/aligned_comparison-2048x1463.png 2048w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><figcaption>Initial image on the left, aligned image on the right<\/figcaption><\/figure>\n\n\n\n<p>First we need to include the required header files:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\n#include &lt;opencv2\/core.hpp&gt;\n#include &lt;opencv2\/imgcodecs.hpp&gt;\n#include &lt;opencv2\/highgui.hpp&gt;\n#include &lt;opencv2\/imgproc.hpp&gt;\n#include &lt;iostream&gt;\n#include &lt;iomanip&gt;\n#include &lt;string&gt;\n<\/pre><\/div>\n\n\n<p>Let&#8217;s read an image in. You probably will have a color image, so first we need to convert it to gray scale (one channel only).<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\ncv::Mat image = cv::imread( &quot;text_rotated.png&quot;, cv::IMREAD_COLOR);\ncv::Mat gray;\ncv::cvtColor(image, gray, cv::COLOR_BGR2GRAY);\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-medium\"><img loading=\"lazy\" decoding=\"async\" width=\"210\" height=\"300\" src=\"https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/gray-2-210x300.png\" alt=\"\" class=\"wp-image-500\" srcset=\"https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/gray-2-210x300.png 210w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/gray-2-717x1024.png 717w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/gray-2-768x1097.png 768w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/gray-2-1075x1536.png 1075w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/gray-2.png 1140w\" sizes=\"auto, (max-width: 210px) 100vw, 210px\" \/><figcaption>Initial image, converted to grayscale(one channel)<\/figcaption><\/figure><\/div>\n\n\n\n<p>We&#8217;re interested in the text itself and nothing else, so we&#8217;re going to create an image that represents this. Non-zero pixels represent what we want, and zero pixels represent background. To do that, we will threshold the grayscale image, so that only the text has non-zero values. The explanation of the inputs is simple: <strong>gray<\/strong> is the grayscale image used as input, <strong>thresh<\/strong> is where the thresholded image will be stored. The next argument is the threshold value, set to 0, although it is ignored since it will be calculated by the Otsu algorithm because we&#8217;re using THRESH_OTSU flag. The next argument is 255 which is the value to set the pixels that pass the threshold, and finally the other flag indicates that we want to use a binary output (all or nothing), and that we want the output reversed (since the text is black in the original).<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\ncv::Mat thresh;\ncv::threshold(gray, thresh, 0, 255, cv::THRESH_BINARY_INV | cv::THRESH_OTSU);\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/thresh-1.png\" alt=\"\" class=\"wp-image-501\" width=\"486\" height=\"701\"\/><figcaption>By thresholding the initial image, we now have an image that has non-zero values for the pixels that represent text, although it still has some noise<\/figcaption><\/figure><\/div>\n\n\n\n<p>First we need to remove those extra white pixels outside of the area of the text. We can do that with the morphological operator Open, which basically erodes the image (makes the white areas smaller), and then dilates it back (make the white areas larger again). By doing that any small dots, or noise in the image, will be removed. The larger the kernel size, the more noise will be removed. You can do this either by hand, or with an iterative method that checks, for example, the ratio of non-zero to zero pixels for each value, and select the one that maximizes that metric.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\ndouble kernel_size = 4;\ncv::Mat thresh_filtered;\ncv::Mat kernel = cv::getStructuringElement( cv::MORPH_RECT, cv::Size(kernel_size, kernel_size));\ncv::morphologyEx(thresh, thresh_filtered, cv::MORPH_OPEN, kernel);\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/thresh2.png\" alt=\"\" class=\"wp-image-502\" width=\"496\" height=\"708\" srcset=\"https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/thresh2.png 1140w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/thresh2-210x300.png 210w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/thresh2-717x1024.png 717w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/thresh2-768x1097.png 768w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/thresh2-1075x1536.png 1075w\" sizes=\"auto, (max-width: 496px) 100vw, 496px\" \/><figcaption>We now have an image that only represents text, without any external noise<\/figcaption><\/figure>\n\n\n\n<p>Now that we have an image that represents the text, we&#8217;re interested in knowing the angle in which it is rotated. There are a few different ways of doing this. Since OpenCV has a convenient function that gives you the minimum rectangle that contains all non-zero values, we could use that. Let&#8217;s see how it works:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\ncv::Mat nonZeroCoordinates;\ncv::findNonZero(thresh, nonZeroCoordinates);\ncv::Mat imageCopy = image.clone();\n\nfor (int i = 0; i &lt; nonZeroCoordinates.total(); i++ ) \n{\n    cv::Point pt = nonZeroCoordinates.at&lt;cv::Point&gt;(i);\n    cv::circle(imageCopy, pt, 1, cv::Scalar(255, 0, 0), 1);\n}\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/nonzero-2.png\" alt=\"\" class=\"wp-image-503\" width=\"526\" height=\"753\" srcset=\"https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/nonzero-2.png 1140w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/nonzero-2-210x300.png 210w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/nonzero-2-1075x1536.png 1075w\" sizes=\"auto, (max-width: 526px) 100vw, 526px\" \/><figcaption>In blue are all the pixels selected as text in the original image. We don&#8217;t need to be exact here and select every text pixel, but we don&#8217;t want any non text pixel to be blue.<\/figcaption><\/figure>\n\n\n\n<p>We can now use those coordinates and ask OpenCV to give us the minimum rectangle that contains them:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\ncv::RotatedRect box = cv::minAreaRect(nonZeroCoordinates);\ncv::Point2f vertices&#x5B;4];\nbox.points(vertices);\nfor (int i = 0; i &lt; 4; i++) \n{\n    cv::line(imageCopy, vertices&#x5B;i], vertices&#x5B;(i+1)%4], cv::Scalar(0,255,0), 2);\n}\n\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/nonzero_rectangle-2.png\" alt=\"\" class=\"wp-image-504\" width=\"497\" height=\"709\" srcset=\"https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/nonzero_rectangle-2.png 1140w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/nonzero_rectangle-2-210x300.png 210w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/nonzero_rectangle-2-717x1024.png 717w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/nonzero_rectangle-2-768x1097.png 768w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/nonzero_rectangle-2-1075x1536.png 1075w\" sizes=\"auto, (max-width: 497px) 100vw, 497px\" \/><figcaption>In green is the minimum rectangle that contains all the blue points, which represents text in this case<\/figcaption><\/figure>\n\n\n\n<p>The estimated angle can then be simply retrieved from the returned rectangle. <strong>Note<\/strong>: Remember to double check the returned angle, as it might be different to what you&#8217;re expecting. The function <strong>minAreaRect<\/strong> always returns angles between 0 and -90 degrees.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/out-1.png\" alt=\"\" class=\"wp-image-494\" width=\"400\" height=\"200\" srcset=\"https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/out-1.png 510w, https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/out-1-300x150.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/figure>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\nfloat angle = box.angle;\nif (angle &lt; -45.0f)\n{\n   angle = (90.0f + angle);\n}\n<\/pre><\/div>\n\n\n<p>Once you have the estimated angle, it&#8217;s time to rotate the image back. First we calculate a rotation matrix based on the angle, and then we apply it. The rotation angle is expressed in degrees, and positive values mean a counter-clockwise rotation.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\ncv::Point2f center((image.cols) \/ 2.0f, (image.rows) \/ 2.0f);\ndouble scale = 1.;\ncv::Mat M = cv::getRotationMatrix2D(center, angle, scale);\ncv::Mat rotated;\ncv::warpAffine(image, rotated, M, image.size(), cv::INTER_CUBIC, cv::BORDER_REPLICATE);\n<\/pre><\/div>\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.samontab.com\/web\/wp-content\/uploads\/2020\/11\/aligned-1.png\" alt=\"\" class=\"wp-image-505\" width=\"500\" height=\"712\"\/><\/figure><\/div>\n\n\n\n<p>And that&#8217;s it. You now have an aligned image, ready to be parsed for OCR, or any other application.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Most optical character recognition(OCR) software first aligns the image properly before detecting the text in it. Here I&#8217;ll show you how to do that with OpenCV: First we need to include the required header files: Let&#8217;s read an image in. You probably will have a color image, so first we need to convert it to [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[29,30,4],"tags":[41,44,45,40,42,39,43],"class_list":["post-487","post","type-post","status-publish","format-standard","hentry","category-computer-vision","category-opencv","category-programming","tag-alignment","tag-findnonzero","tag-minarearect","tag-ocr","tag-otsu","tag-text","tag-threshold"],"_links":{"self":[{"href":"https:\/\/www.samontab.com\/web\/wp-json\/wp\/v2\/posts\/487","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.samontab.com\/web\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.samontab.com\/web\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.samontab.com\/web\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.samontab.com\/web\/wp-json\/wp\/v2\/comments?post=487"}],"version-history":[{"count":0,"href":"https:\/\/www.samontab.com\/web\/wp-json\/wp\/v2\/posts\/487\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.samontab.com\/web\/wp-json\/wp\/v2\/media?parent=487"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.samontab.com\/web\/wp-json\/wp\/v2\/categories?post=487"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.samontab.com\/web\/wp-json\/wp\/v2\/tags?post=487"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}