[mmdasp] [Up] [mmdbeef] Demonstrations

mmdlabeltext
Segmenting letters, words and paragraphs.

Description

In this example, a digitized text is processed to identify the letters, words and paragraphs. This demonstration uses only the mmlabel function with different connectivity parameters.

Demo Script

Reading

The text image is read.

f = mmreadgray('stext.tif')
mmshow(f)
f

First, label the letters.

The letters are the main connected components in the image. So we use the classical 8-connectivity criteria for identify each letter.

fl=mmlabel(f,mmsebox())
mmlblshow(fl)
fl

Second, label the words.

The words are made of closed letters. In this case we use a connectivity specified by a rectangle structuring element of 7 pixels high and 11 pixels width, so any two pixels that can be hit by this rectangle, belong to the same connected component. The values 7 and 11 were chosen experimentally and depend on the font size.

sew = mmimg2se(mmbinary(ones((7,11))))
Warning: downcasting image from double to int32 (may lose precision)
mmseshow(sew)
array([[ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True,
         True,  True]], dtype=bool)
fw=mmlabel(f,sew)
mmlblshow(fw)
fw

Finally, label the paragraphs.

Similarly, paragraphs are closed words. In this case the connectivity is given by a rectangle of 35 by 20 pixels.

sep = mmimg2se(mmbinary(ones((20,35))))
Warning: downcasting image from double to int32 (may lose precision)
fp=mmlabel(f,sep)
mmlblshow(fp)
fp

See also

mmlabel Label a binary image.
[mmdasp] [Up] [mmdbeef]