Saad Bin Ahmed , Muhammad Imran Razzak and Rubiyah Yusof
Cursive Script Text Recognition in Natural Scene Images
Arabic Text Complexities
Saad Bin Ahmed
King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
Malaysia-Japan International Institute of Technology (M-JIIT), University of Technology Malaysia, Kuala Lumpur, Malaysia
Muhammad Imran Razzak
School of Information Technology, Deakin University, Geelong, VIC, Australia
Rubiyah Yusof
Malaysia-Japan International Institute of Technology (M-JIIT), University of Technology Malaysia, Kuala Lumpur, Malaysia
ISBN 978-981-15-1296-4 e-ISBN 978-981-15-1297-1
https://doi.org/10.1007/978-981-15-1297-1
Springer Nature Singapore Pte Ltd. 2020
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
The complexities of cursive scene text are important to highlight in pursuing possible solutions demonstrated by researchers. The contemporary researchers in natural language processing (NLP) field must thrive premise knowledge that elaborates the complexities of cursive text which further provides help in defining solutions explicitly designed for Arabic scene text recognition systems. This book comprehends the intended researchers who want to have an elementary knowledge of Arabic scene text and all its relevant issues. The successful learning systems start with the assumption that how good features have been evolved from camera-captured text images. The text analysis emphasized essential skills in building reliable systems utilizing expertise by relying on the process with an impetus to recognize provided samples with higher accuracy.
When dynamics of cursive scene text are analyzed with emphasis on Arabic, then it is learned that traditional OCR approaches are not suitable to be applied to this intrinsic text pattern. The variant nature of traditional OCRs does not possess any similarity with the issues related to scene text analysis. The difference between aforementioned systems should clearly understandable by the reader. The detail covering context-based LSTM approaches designed for Arabic scene text recognition systems constituted state-of-the-art solutions which need to be discussed in a comprehensive manner.
This book is a result of all difficulties that are overlooked during proposed research. The rationale behind this effort is to compile all proposed solutions that have been suggested for smooth research activities performed in cursive scene text analysis. It is pertinent to discuss regarding deep learning architecture which believes to be suitable for learning the solutions designed for complex script, appeared in natural image. This book places a clear emphasis on context learning classification methods which are a result of rigorous research performed in scene text analysis.
Saad Bin Ahmed
Muhammad Imran Razzak
Rubiyah Yusof
Kuala Lumpur, Malaysia
July 2019
Whatever you can do, or dream you can, begin it. Boldness has genius, power, and magic in it.
Goethe
Acknowledgements
First and foremost, praises and thanks to Almighty for His showers of blessings throughout this research work and successful completion of writing this book.
Writing a book is harder than someones thought and more rewarding than anyone has ever imagined. None of this would have been possible without precious discussion among groupmates that reflects the flow of contents and overall organization of this book. As a first author of this book, I am grateful to all of those with whom I have had the pleasure to work during this and other related projects. Dr. Muhammad Imran Razzak and Prof. Rubiyah Yusof have provided me extensive personal and professional guidance and taught me a great deal about both scientific research and life in general.
Nobody has been more important to me in the pursuit of this project than the members of my family. I would like to thank my parents, whose love and guidance are with me in whatever I pursue. They are the ultimate role models. Most importantly, I wish to thank my loving and supportive wife, Tayyaba, and my three wonderful children, Aiza, Rameen, and Aleyan, who provide unending inspiration.
At the end, we would like to thank Springer for providing us an opportunity to write this book that could be beneficial for newbie in the field of document image analysis and natural language processing.
Acronyms
ANN
Artificial neural network
ASTR
Arabic scene text recognition
ConvNets
Convolutional neural networks
CRF
Conditional random fields
CTC
Connectionist temporal classification
DSLR
Digital single-lens reflex
EASTR
EnglishArabic scene text recognition
ESTR
English scene text recognition
FCN
Fully convolutional networks
HOG
Histogram of gradients
ISRI
Information Science Research Institute
LDA
Linear discriminative analysis
LSTM
Long short-term memory networks
MDLSTM
Multidimensional long short-term memory networks
MKL
Multiple kernel learning
ML
Machine learning
MNIST
Modified National Institute of Standards and Technology
MSER
Maximally stable extremal regions
MSRA-TD
MSRA Text Detection
NLP
Natural language processing
NN
Nearest neighbor
OCR
Optical character recognition
RNN
Recurrent neural network
SIFT
Scale-invariant feature transformation
STC
Scene text character
STR
Scene text recognition
SVM
Support vector machine
SVT
Street View Text
UNHD
Urdu Nastaliq Handwritten Dataset
UNLV
University of Nevada, Las Vegas
UPTI
Urdu Printed Text Images
Contents
About the Authors
Dr. Saad Bin Ahmed
is a lecturer at King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia (KSAU-HS). He is also associated with Center of Artificial Intelligence and Robotics (CAIRO) research lab at the Malaysia-Japan International Insitute of Technology (M-JIIT), Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia. He completed his Ph.D. in Intelligent Systems at the Universiti Teknologi Malaysia in 2019. Prior to that, he completed his Master of Computer Science in Intelligent Systems at the Technische Universitt, Kaiserslautern, Germany, and was a research assistant at the Image Understanding and Pattern Recognition Research Group at the same university. His areas of interests are document image analysis, machine learning, computer vision, and optical character recognition. He has authored more than 25 research articles in leading journals and conferences, as well as book chapters.