[mmdasp] [Up] [mmdbeef] Demonstrations

mmdlabeltext
Segmenting letters, words and paragraphs.

Description

In this example, a digitized text is processed to identify the letters, words and paragraphs. This demonstration uses only the mmlabel function with different connectivity parameters.

Illustrated Source Code

Reading

The text image is read.

f = mmreadgray('stext.tif');
mmshow(f);
f

First, label the letters.

The letters are the main connected components in the image. So we use the classical 8-connectivity criteria for identify each letter.

fl=mmlabel(f,mmsebox);
mmlblshow(fl);
fl

Second, label the words.

The words are made of closed letters. In this case we use a connectivity specified by a rectangle structuring element of 7 pixels high and 11 pixels width, so any two pixels that can be hit by this rectangle, belong to the same connected component. The values 7 and 11 were chosen experimentally and depend on the font size.

sew = mmimg2se(logical(uint8(ones(7,11))));
mmseshow(sew)
ans =
     1     1     1     1     1     1     1     1     1     1     1
     1     1     1     1     1     1     1     1     1     1     1
     1     1     1     1     1     1     1     1     1     1     1
     1     1     1     1     1     1     1     1     1     1     1
     1     1     1     1     1     1     1     1     1     1     1
     1     1     1     1     1     1     1     1     1     1     1
     1     1     1     1     1     1     1     1     1     1     1
fw=mmlabel(f,sew);
mmlblshow(fw);
fw

Finally, label the paragraphs.

Similarly, paragraphs are closed words. In this case the connectivity is given by a rectangle of 35 by 20 pixels.

sep = mmimg2se(logical(uint8(ones(20,35))));
fp=mmlabel(f,sep);
mmlblshow(fp);
fp

See also

mmlabel Label a binary image.
[mmdasp] [Up] [mmdbeef]