mmdlabeltext - Segmenting letters, words and paragraphs.
-
Description
- In this example, a digitized text is processed to identify the letters, words and paragraphs. This demonstration uses only the mmlabel function with different connectivity parameters.
-
See Also
- mmlabel - Label a binary image.
Reading
The text image is read.
First, label the letters.
The letters are the main connected components in the image. So we use the classical 8-connectivity criteria for identify each letter.
Second, label the words.
The words are made of closed letters. In this case we use a connectivity specified by a rectangle structuring element of 7 pixels high and 11 pixels width, so any two pixels that can be hit by this rectangle, belong to the same connected component. The values 7 and 11 were chosen experimentally and depend on the font size.
sew = mmimg2se(logical(uint8(ones(7,11))));
mmseshow(sew)
ans =
1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1
fw=mmlabel(f,sew);
mmlblshow(fw);
|
| (fw) |
|
|
Finally, label the paragraphs.
Similarly, paragraphs are closed words. In this case the connectivity is given by a rectangle of 35 by 20 pixels.