Publications

* Click on the title to read (or hide) the abstract.

2025
Conference SANTOS, A.; OLIVEIRA, L.; SANTOS, W.; DUARTE, A.; Addressing Class Imbalance in Renal Amyloidosis Classification: A Comparative Study of Few-Shot Learning and Conventional Machine Learning Techniques. In: International Conference on Image Processing and Vision Engineering (IMPROVE), 2025 Class imbalance presents a significant challenge in Computational Pathology, particularly in classifying rare diseases such as renal amyloidosis. This paper investigates the effectiveness of Few-Shot Learning (FSL), specifically through prototypical networks, alongside conventional methods to enhance the automatic classification of renal glomeruli from biopsy images. A novel multi-stain dataset is introduced, comprising 11,674 annotated images across nine glomerular lesion classes, including amyloidosis, stained with four different dyes. The study compared baseline CNN models with FSL approaches, both with and without CostSensitive Learning (CSL). The FSL-CSL-Ensemble achieved the highest F1-Score of 93.8%, surpassing the performance of related studies that addressed datasets with less severe imbalance ratios. This study underscores the potential of FSL in classifying renal amyloidosis, especially when combined with CSL, and suggests the possibility of eliminating the need for Congo red staining, the current gold standard for diagnosis. The findings highlight the necessity of developing innovative approaches like FSL to improve outcomes in medical image analysis, where data scarcity is prevalent.
Journal SOUZA, L.; SILVA, J.; MENDONÇA, M.; NATHAN, J.; DUARTE, A.; SARDER, P.; DOS-SANTOS, W. L. C.; OLIVEIRA, L.; The problem of segmenting global glomerulosclerosis in gigapixel histopathological images: the borderless glomeruli. In: BMC Nephrology, 2025 Background Accurately segmenting glomeruli in kidney whole slide images (WSIs) is essential for advancing automation in renal pathology but remains challenging in cases of global glomerulosclerosis, where Bowman’s capsule boundaries are often unclear. Conventional machine learning (ML) models perform well on normal glomeruli but struggle with sclerotic cases due to the lack of distinct structural cues. This study investigates the use of the foundation model segmentation generative pre-trained transformer (SegGPT) to address this limitation. Methods We conducted experiments at both the patch and WSI levels on a private dataset to evaluate the performance of SegGPT foundation model against three non-foundation architectures, U-Net, U-Net3+, and SwinTransformer+U-Net, trained with and without fine-tuning. Results The study revealed high segmentation performance for normal glomeruli, with non-foundation models achieving mean Dice similarity coefficient (mDice) scores of up to 0.94. For segmental sclerosis, performance was moderate, with scores reaching up to 0.73. In contrast, the segmentation of globally sclerotic glomeruli proved substantially more challenging: Models trained only on normal samples yielded mDice scores below 0.03, and even with fine-tuning on mixed datasets, WSI-level performance remained limited (mDice<0.16). With only few annotated examples, SegGPT demonstrated markedly superior performance in this scenario, achieving up to 0.43 at the WSI level and 0.74 at the patch level. However, its performance under idealized conditions also reveals limitations in clinical generalization. Conclusion While conventional models perform well on normal and segmentally sclerotic glomeruli, their performance declines sharply in globally sclerotic cases, even with fine-tuning. SegGPT showed better generalization in these challenging scenarios, particularly at the patch level. However, its limited performance at the WSI level underscores the difficulty of translating patch-level accuracy to full-slide inference, where contextual ambiguity is greater. These results expose a persistent gap between controlled experimental setups and real-world conditions, reinforcing the need for more realistic evaluation protocols to advance clinical applicability. Keywords Automatic segmentation, Glomerulus, Sclerosis
Conference ANDRADE, F.; CARIGÉ, R.; SCHINEIDER, A.; OLIVEIRA, L.; When text and image meet face emotion retrieval: Benchmark and a variegated dataset. In: Conference on Graphics, Patterns and Images (SIBGRAPI), 2025 Automatic facial emotion recognition, despite deep learning advancements, faces challenges with large manually annotated datasets and demographic imbalance. While visual-language models (VLMs) show promise in aligning visual and textual emotion data, content-based facial emotion retrieval is underexplored, and larger models exhibit demographic sensitivity. This study introduces Facial INtensity of Emotions (FINE), the first large-scale dataset balancing gender, race, and Ekman’s six emotions, across four intensity of emotion levels. We propose a retrieval protocol and baseline using fine-tuned contrastive language-image pre-training-based VLMs, demonstrating that fine-tuning on FINE consistently improves accuracy and reduces performance variance across demographics and emotion classes, thereby mitigating representation bias. However, misclassification rates still increase at the highest intensity levels even after fine-tuning, indicating expression magnitude remains an open challenge. This work confirms fine-tuning’s value in enhancing generalization and reducing variability in emotion retrieval, establishing FINE as a robust benchmark.
Conference LIMA, D.; OLIVEIRA, G.; DUARTE, A.; SANTOS, W.; OLIVEIRA, L.; Quo vadis pathology? Advancing glomerular lesion classification with foundation models. In: Conference on Graphics, Patterns and Images (SIBGRAPI), 2025 Computational pathology is undergoing a significant transformation with the emergence of foundation models (FMs). These models leverage self-supervised learning on extensive histopathological datasets with the aim of extracting robust feature representations. FMs hold potential to automate advanced diagnostic pipelines, encompassing segmentation, classification, and biomarker discovery. This study evaluates the effectiveness of embeddings from four SOTA FMs (UNI, UNI2, Phikon, and Phikon2) for one-versus-all glomerular lesion classification. We propose here a comparative framework in which a multilayer perceptron (MLP) and a support vector machine (SVM) – each trained exclusively on FM-derived embeddings – are benchmarked against EfficientNet, a fully supervised end-to-end image classifier. By varying the number of cross-validation folds (from k=2, representing minimal training data, to k=5, representing maximal training data), on a proprietary histopathology dataset, we assess classifier robustness under differing data regimes. Our results demonstrate that, even without any FM finetuning, the UNI/SVM pipeline outperforms the EfficientNet by 3.4 percentage points in average F1-score, considering all values of k.
Conference SANTOS, M.; BORJA, I.; LIMA, D.; OLIVEIRA, G.; DUARTE, A.; SANTOS, W.; OLIVEIRA, L.; Toward linear representations of foundation models for histopathology image retrieval. In: Conference on Graphics, Patterns and Images (SIBGRAPI), 2025 While domain-specific foundation models have accelerated advancements by leveraging large-scale histopathological datasets, they often struggle to generalize across diverse clinical scenarios and datasets. In this paper, we address this limitation with a linear-prototype framework: A 128-D projection head trained by a modified prototypical loss on a few labelled slides. Evaluated on proprietary glomerular biopsies, dermoscopic skin cancer, and ovarian-cancer WSIs, our proposal approach consistently boosts image retrieval. Across nine backbones (UNI/UNI2- h, Phikon/Phikon-v2, Virchow2, DINO/DINOv2, ViT, and a ResNet-50 baseline), few-shot tuning raises mAP by up to 15 percentage points. These gains show that our lightweight, prototype-based layer can reconcile the breadth of foundation pre-training with the depth of task-specific discrimination, enabling improvement on medical-image retrieval across diverse pathologies.
Conference LIANG, JULIAN.; BORJA, I.; LIMA, D.; JÚNIOR, M.; CURY, P.; OLIVEIRA, L.; SESA-KAN: Simultaneous estimation of sex and age from dental panoramic radiographs. In: International Joint Conference on Neural Networks (IJCNN), 2025 This paper proposes a novel multi-task learning framework for joint sex classification and age estimation from panoramic dental radiographs. Our proposed method combines masked autoencoders for self-supervised pretraining on largescale unlabeled data, a Kolmogorov-Arnold network (KAN) to model nonlinear relationships between dental features and demographic labels, and a dynamic logarithmic loss function to balance sex classification and age regression tasks within a Vision Transformer (ViT) architecture. The proposed framework, so called Simultaneous Estimation of Sex and Age via KAN (SESA-KAN), achieved a mean absolute error of 3.39 years for age estimation and an F1-score of 94.2% for sex classification on real-world datasets. Comparative evaluations showed average improvements over existing methods, with a 0.5-year reduction in age estimation error and a 7.5 percentage point increase in sex classification accuracy. The results highlight the effectiveness of integrating self-supervised pretraining, KAN-based feature decomposition, and adaptive task balancing for multi-task medical image analysis. This work advances automated demographic analysis in dental radiography, with potential applications in forensic dentistry.
Journal SILVA, B.; FONTINELE, J.; VIEIRA, C.; TAVARES, J.; CURY, P.; OLIVEIRA, L.; A holistic approach for classifying dental conditions from textual reports and panoramic radiographs. In: Medical Image Analysis (MEDIA), 2025 Dental panoramic radiographs offer vast diagnostic opportunities, but the shortage of labeled data hampers the training of supervised deep-learning networks for the automatic analysis of these images. To address this issue, we introduce a holistic learning approach to classify dental conditions on panoramic radiographs, exploring tooth segmentation and textual reports, without a direct tooth-level annotated dataset. Large language models were used to identify the prevalent dental conditions in these reports, acting as an auto-labeling procedure. After an instance segmentation network segments the teeth, a linkage approach is in charge of matching each tooth with the corresponding condition found in the textual report. The proposed framework was validated using two of the most extensive datasets in the literature, specially gathered for this study, consisting of 8,795 panoramic radiographs and 8,029 paired reports and images. Encouragingly, the results consistently exceeded the baseline for the Matthews correlation coefficient. A comparative analysis against specialist and dental student ratings, supported by statistical evaluation, highlighted its effectiveness. Using specialist consensus as the ground truth, the system achieved precision comparable to final-year undergraduate students and was within 8.1 percentage points of specialist performance.
Conference NASCIMENTO, Débora B.; OLIVEIRA, Luciano; SANTOS, Washington; DUARTE, Angelo; AIRES, Kelson R. T.; VERAS, Rodrigo M. S.. Segmentation of the Glomerular Region in Pathological Kidney Slides. In: Proceedings of the National Meeting on Artificial and Computational Intelligence (ENIAC), 2025 This study proposes a method for segmenting the glomerular region in renal histological images using convolutional neural networks (CNNs) based on U-Net and Sharp U-Net architectures with pretrained backbones. A total of 643 images stained with hematoxylin-eosin (HE), periodic acid-Schiff (PAS), and periodic acid-methenamine silver (PAMS) were evaluated, applying stratified 5- fold cross-validation. The U-Net with VGG-19 achieved the highest mean Dice score (95.45%), followed by the Sharp U-Net with DenseNet201. The results were consistent across staining techniques, with a slight advantage for PAMS and PAS. The method demonstrated accuracy and robustness, highlighting its potential as a diagnostic support tool in nephropathology.
2024
Conference PRADO, I.; LIMA, D.; LIANG, J.; HOUGAZ, A.; PETERS, B.; REBOUÇAS, L.; Multi-task learning based on log dynamic loss weighting for sex classification and age estimation on panoramic radiographs. In: Conference on Computer Vision, Imaging and Computer Graphics Theory and Applicationse (VISAPP), 2024. This paper introduces a multi-task learning (MTL) approach for simultaneous sex classification and age estimation in panoramic radiographs, aligning with the tasks pertinent to forensic dentistry. We dynamically optimize the logarithm of task-specific weights during the loss training. Our results demonstrate the superior performance of our proposed MTL network compared to the individual task-based networks, particularly evident across a diverse data set comprising 7,666 images, spanning ages from 1 to 90 years and encompassing significant sex variability. Our network achieved an F1-score of 90.37% and a mean absolute error of 5.66 through a cross-validation assessment procedure, which resulted in a gain of 1.69 percentage points and 1.15 years with respect to the individual sex classification and age estimation procedures. To the best of our knowledge, it is the first successful MTL-based network for these two tasks.
Journal BARROS, G.; DA SILVA, J.; LIANG, J.; PROENÇA, H.; ARAÚJO, S.; OLIVEIRA, L.; DOS SANTOS, W.; DUARTE, A.; VIDAL, F.; Enhancing Podocyte Degenerative Changes Identification With Pathologist Collaboration: Implications for Improved Diagnosis in Kidney Diseases. In: IEEE Journal of Translational Engineering in Health and Medicine, 2024. Podocyte degenerative changes are common in various kidney diseases, and their accurate identification is crucial for pathologists to diagnose and treat such conditions. However, this can be a difficult task, and previous attempts to automate the identification of podocytes have not been entirely successful. To address this issue, this study proposes a novel approach that combines pathologists’ expertise with an automated classifier to enhance the identification of podocytopathies. The study involved building a new dataset of renal glomeruli images, some with and others without podocyte degenerative changes, and developing a convolutional neural network (CNN) based classifier. The results showed that our automated classifier achieved an impressive 90.9% f-score. When the pathologists used as an auxiliary tool to classify a second set of images, the medical group’s average performance increased significantly, from 91.4±12.5 % to 96.1±2.9 % of f-score. Fleiss’ kappa agreement among the pathologists also increased from 0.59 to 0.83. Conclusion: These findings suggest that automating this task can bring benefits for pathologists to correctly identify images of glomeruli with podocyte degeneration, leading to improved individual accuracy while raising agreement in diagnosing podocytopathies. This approach could have significant implications for the diagnosis and treatment of kidney diseases. Clinical impact: The approach presented in this study has the potential to enhance the accuracy of medical diagnoses for detecting podocyte abnormalities in glomeruli, which serve as biomarkers for various glomerular diseases.
2023
Conference MENDONÇA, M.; FONTINELE, J.; OLIVEIRA, L.; SHLS: Superfeatures learned from still images for self-supervised VOS. In: British Machine Vision Conference (BMVC'2023), 2023. Self-supervised video object segmentation (VOS) aims at eliminating the need for manual annotations to learn VOS. However, existing methods often require extensive training data consisting of hours of videos. In this paper, we introduce a novel approach that combines superpixels and deep learning features through metric learning, enabling us to learn VOS from a small dataset of unlabeled still images. Our method, called superfeatures in a highly compressed latent space (SHLS), embeds convolutional features into the corresponding superpixel areas, resulting in ultra-compact image representations. This allowed us to construct an efficient memory mechanism to store and retrieve past information throughout a frame sequence to support current frame segmentation. We evaluate our method on the popular DAVIS dataset and achieve competitive results compared to state-of-the-art self-supervised methods, which were trained with much larger video-based datasets. We have made our code and trained model publicly available at: https://github.com/IvisionLab/SHLS.
Conference LIANG, J.; CURY, P.; OLIVEIRA, L.; Revisiting age estimation on panoramic dental images. In: Conference on Graphics, Patterns and Images (SIBGRAPI'2023), 2023. Forensic dentistry has traditionally relied on bone or dental indicators, primarily utilizing dental radiographs, for age estimation. However, limited research has been conducted on automatic age estimation on panoramic images, needing a reevaluation of the existing methodologies to assess the performance of computer-based methods. This study proposes to revisit the analysis of age estimation methods using panoramic dental radiographs. We have curated the largest publicly available dataset of panoramic dental images, encompassing diverse dental conditions and age ranges. Specifically, our study focuses on evaluating three distinct classes of deep-learning architectures: ViT, ConvNeXt-V2, and EfficientNets, employing a comprehensive to assess their performances that better favor reproducibility. By comparing our approach with existing studies in the literature, we offer valuable insights for forensic investigations in the field of age estimation.
Conference HOUGAZ, A. B., LIMA, D.; PETERS, B.; CURY, P.; OLIVEIRA, L.; Sex estimation on panoramic dental radiographs: A methodological approach. In: Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS), 2023. Estimating sex using tooth radiographs requires knowledge of a comprehensive spectrum of maxillar anatomy, which ultimately demands specialization on the anatomical structures in the oral cavity. In this paper, we propose a more effective methodological study than others present in the literature for the problem of automatic sex estimation. Our methodology uses the largest publicly available data set in the literature, raises statistical significance in the performance assessment, and explains which part of the images influences the classification. Our findings showed that although EfficientNetV2-Large reached an average F1-score of 91,43% +- 0,67, an EfficientNet-B0 could be more beneficial with a very close F1-score and a much lighter architecture.
Journal SOUZA, L.; SILVA, J.; CHAGAS, P.; DUARTE, A.; SANTOS, W. L.; OLIVEIRA, L.; Mouse-to-human transfer learning for glomerulus segmentation. In: Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 2023. Mice and humans share many features of internal organs. Therefore, mice are often used in experimental models of human diseases. Although this is commonplace in medicine, there is an avenue to go to explore it in computational pathology, where digital whole-slide images (WSIs) are the main objects of investigation. Considering the absence of research about knowledge transfer between mice and humans in machine learning modelling, we propose investigating the possibility of segmenting glomeruli in human WSIs by training deep learning models on mouse data only. A set of semantic segmenters were evaluated, which had their performance assessed on two data sets comprised of 18 mouse WSIs, and 42 human WSIs. Results demonstrated that U-Net 3+ achieved superior results on intra-data set: On the mouse data set, it reached the highest average score on HE-stained images, while on the human data set, this network achieved the highest average on all stains. U-Net 3+ also obtained the best results after being trained only on the mouse data set and predicting on the entire (train and test) human data set. Although all networks proved to be capable of segmenting intra-stain images, it was not possible to confirm the same results on inter-stain ones.
Journal CALUMBY, R. T.; DUARTE A. A.; ANGELO, M. F.; SANTOS, E.; SARDER, P.; DOS-SANTOS, W. L.; OLIVEIRA, L. R.; Towards Real-World Computational Nephropathology. In: Clinical Journal of the American Society of Nephrology: CJASN, 2023. In pathology, defining consistent criteria for disease diagnosis and prognostication requires ever increasing effort. Nephropathology, in particular, demands greater attention to the correspondence between distinctive patterns of kidney lesions and their associated clinical diagnoses. Some examples include changes in the glomerular basement membrane and nephrotic syndrome, endocapillary hypercellularity and nephritic syndrome, and extensive glomerular crescents and renal dysfunction. […]
Journal ANDRADE, K.; SILVA, B.; OLIVEIRA, L.; and CURY, P. Automatic dental biofilm detection based on deep learning. In: Journal of Clinical Periodontology, 2023. Aim To estimate the automated biofilm detection capacity of the U-Net neural network on tooth images. Materials and Methods Two datasets of intra-oral photographs taken in the frontal and lateral views of permanent and deciduous dentitions were employed. The first dataset consisted of 96 photographs taken before and after applying a disclosing agent and was used to validate the domain’s expert biofilm annotation (intra-class correlation coefficient = .93). The second dataset comprised 480 photos, with or without orthodontic appliances, and without disclosing agents, and was used to train the neural network to segment the biofilm. Dental biofilm labelled by the dentist (without disclosing agents) was considered the ground truth. Segmentation performance was measured using accuracy, F1 score, sensitivity, and specificity. Results The U-Net model achieved an accuracy of 91.8%, F1 score of 60.6%, specificity of 94.4%, and sensitivity of 67.2%. The accuracy was higher in the presence of orthodontic appliances (92.6%). Conclusions Visually segmenting dental biofilm employing a U-Net is feasible and can assist professionals and patients in identifying dental biofilm, thus improving oral hygiene and health.
Journal SILVA, B.; PINHEIRO, L.; SOBRINHO, B.; LIMA, F.; SOBRINHO, B.; LIMA, K.; PITHON, M.; CURY, P.; and OLIVEIRA, L. Boosting research on dental panoramic radiographs: a challenging data set, baselines, and a task central online platform for benchmark. In: Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 2023. We address in this study the construction of a public data set of dental panoramic radiographs. Our objects of interest are the teeth, which are segmented and numbered. We benefited from the human-in-the-loop concept to expedite the labelling procedure, using predictions from deep neural networks as provisional labels, later verified by human annotators. Our results demonstrated a 51% labelling time reduction using HITL, saving us more than 390 continuous working hours. In a novel online platform, called OdontoAI, created to work as task central for this novel data set, we released 4,000 images, from which 2,000 have their labels publicly available for model fitting. The labels of the other 2,000 images are private and used for model evaluation on different tasks. To the best of our knowledge, this is the largest-scale publicly available data set for panoramic radiographs, and the OdontoAI is the first platform of its kind in dentistry.
2022
Journal SANTOS, J.; SILVA, R.; OLIVEIRA, L.; SANTOS, W.; ALDEMAN, N.; DUARTE, A.; and VERAS, R. Glomerulosclerosis detection with pre-trained CNNs ensemble. In: Computational Statistics, 2022. Glomerulosclerosis characterizes many conditions of primary kidney disease in advanced stages. Its accurate diagnosis relies on histological analysis of renal cortex biopsy, and it is paramount to guide the appropriate treatment and minimize the chances of the disease progressing to chronic stages. This article presents an ensemble approach composed of five convolutional neural networks (CNNs) - VGG-19, Inception-V3, ResNet-50, DenseNet-201, and EfficientNet-B2 - to detect glomerulosclerosis in glomerulus images. We fine-tuned the CNNs and evaluated several configurations for the fully connected layers. In total, we analyzed 25 different models. These CNNs, individually, demonstrated effectiveness in the task; however, we verified that the union of these five well-known CNNs improved the detection rate while decreasing the standard deviations of current techniques. The experiments were carried out in a data set comprised of 1,028 images, on which we applied data-augmentation techniques in the training set. The proposed CNNs ensemble achieved a near-perfect accuracy of 99.0% and kappa of 98.0%.
Chapter SILVA, B.; PINHEIRO, L., ANDRADE, K.; CURY, P.; and OLIVEIRA, L. Dental image analysis: Where deep learning meets dentistry. In: Convolutional Neural Networks for Medical Image Processing Applications, 2022. Abstract: The rise in living standards increases the expectation of people in almost every field. At the forefront is health. Over the past few centuries, there have been major developments in healthcare. Medical device technology and developments in artificial intelligence (AI) are among the most important ones. The improving technology and our ability to harness the technology effectively by means such as AI have led to unprecedented advances, resulting in early diagnosis of diseases. AI algorithms enable the fast and early evaluation of images from medical devices to maximize the benefits. While developments in the field of AI were quickly adapted to the field of health, in some cases this contributed to the formation of innovative artificial intelligence algorithms. Today, the most effective artificial intelligence method is accepted as deep learning. Convolutional neural network (CNN) architectures are deep learning algorithms used for image processing. This book contains applications of CNN methods. The content is quite extensive, including the application of different CNN methods to various medical image processing problems. Readers will be able to analyze the effects of CNN methods presented in the book in medical applications.
Journal L. C. DOS-SANTOS, W.; A. R. DE FREITAS, L., DUARTE, A.; ANGELO, M.; and OLIVEIRA, L. Computational pathology, new horizons and challenges for anatomical pathology. In: Surgical and Experimental Pathology, 2022. Abstract: The emergence of digital pathology environments and the application of computer vision to the analysis of histological sections has given rise to a new area of Anatomical Pathology, termed Computational Pathology. Advances in Computational Pathology may substantially change the routine of Anatomical Pathology laboratories and the work profile of the pathologist.
Journal SILVA, J.; SOUZA, L., CHAGAS, P.; CALUMBY, R.; SOUZA, B.; PONTES, I.; DUARTE, A.; LC-DOS-SANTOS, W.; and OLIVEIRA, L. Boundary-aware glomerulus segmentation: Toward one-to-many stain generalization. In: Computerized Medical Imaging and Graphics, 2022. Abstract: The growing availability of scanned whole-slide images (WSIs) has allowed nephropathology to open new possibilities for medical decision-making over high-resolution images. Diagnosis of renal WSIs includes locating and identifying specific structures in the tissue. Considering the glomerulus as one of the first structures analyzed by pathologists, we propose here a novel convolutional neural network for glomerulus segmentation. Our end-to-end network, named DS-FNet, combines the strengths of semantic segmentation and semantic boundary detection networks via an attention-aware mechanism. Although we trained the proposed network on periodic acid-Schiff (PAS)-stained WSIs, we found that our network was capable to segment glomeruli on WSIs stained with different techniques, such as periodic acid-methenamine silver (PAMS), hematoxylin-eosin (HE), and Masson trichrome (TRI). To assess the performance of the proposed method, we used three public data sets: HuBMAP, available in a Kaggle competition; a subset of the NEPTUNE data set; and a novel challenging data set, called WSI_Fiocruz. Results showed that DSFNet achieved superior results on all data sets: On HuBMAP, reaching a dice score of 95.05%, very close to the first place (95.15%); on the subset of the NEPTUNE and WSI_Fiocruz, achieving the highest average dice scores when compared with different versions of U-Net, taking into account images with the PAS staining: 92.00% and 86.00%, or images stained with other techniques: 84.00% and 80.00%. To the best we know, this is the first work to show consistently high performance in a one-to-many-stain glomerulus segmentation following a thorough protocol on data sets from different medical labs.
Conference SOUZA, L.; SILVA, J., CHAGAS, P.; DUARTE, A.; LC-DOS-SANTOS, W.; and OLIVEIRA, L. How feasible is it to segment human glomerulus with a model trained on mouse histology images?. In: Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS), 2022. Abstract: Many genetic, physiological and structural characteristics of internal organs are shared by mice and humans. Hence, mice are frequently used in experimental model of human diseases. Although this is an indisputable truth in medicine, there is an avenue to go in computational pathology, where digital images are the main objects of investigation. Considering the lack of study about knowledge transfer between mice and humans concerning machine learning models, we propose investigating if it is possible to segment glomeruli in human WSIs by training deep learning models on mouse data only. Three different semantic segmenters were evaluated, which had their performance assessed on two data sets comprised of 18 mouse WSIs and 30 human WSIs. The results found corroborate our hypothesis validation.
Journal CHAGAS, P.; SOUZA, L., PONTES, I.; CALUMBY, R.; ANGELO, M.; DUARTE, A.; LC-DOS SANTOS, W. and OLIVEIRA, L. Uncertainty-aware membranous nephropathy classification: A Monte-Carlo dropout approach to detect how certain is the model. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 2022. Abstract: Membranous nephropathy (MN) is among the most common glomerular diseases that cause nephrotic syndrome in adults. To aid pathologists on performing the MN classification task, we proposed here a pipeline consisted of two steps. Firstly, we assessed four deep-learning-based architectures, namely, ResNet-18, MobileNet, DenseNet, and Wide-ResNet. To achieve more reliable predictions, we adopted and extensively evaluated a Monte-Carlo dropout approach for uncertainty estimation. Using a 10-fold cross-validation setup, all models achieved average F1-scores above 92%, where the highest average value of 93.2% was obtained by using Wide-ResNet. Regarding uncertainty estimation with Wide-ResNet, high uncertainty scores were more associated with erroneous predictions, demonstrating that our approach can assist pathologists in interpreting the predictions with high reliability. We show that uncertainty-based thresholds for decision referral can greatly improve classification performance, increasing the accuracy up to 96%. Finally, we investigated how the uncertainty scores relate to complexity scores defined by pathologists.
Conference BARROS, G.; WANDERLEY, D.; REBOUÇAS, L.; SANTOS, W.; DUARTE, A. and VIDAL, F. PodNet: Ensemble-based Classification of Podocytopathy on Kidney Glomerular Images. In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, 2022. Abstract: Podocyte lesions in renal glomeruli are identified by pathologists using visual analyses of kidney tissue sections (histological images). By applying automatic visual diagnosis systems, one may reduce the subjectivity of analyses, accelerate the diagnosis process, and improve medical decision accuracy. Towards this direction, we present here a new data set of renal glomeruli histological images for podocitopathy classification and a deep neural network model. The data set consists of 835 digital images (374 with podocytopathy and 430 without podocytopathy), annotated by a group of pathologists. Our proposed method (called here PodNet) is a classification method based on deep neural networks (pre-trained VGG19) used as features extractor from images in different color spaces. We compared PodNet with other six state-of-the-art models in two data set versions (RGB and gray level) and two different training contexts: pre-trained models (transfer learning from Imagenet) and from-scratch, both with hyperparameters tuning. The proposed method achieved classification results to 90.9% of f1-score, 88.9% precision, and 93.2% of recall in the final validation sets.
Chapter OLIVEIRA, L.; CHAGAS, P., DUARTE, A.; CALUMBY, R.; SANTOS, E.; ANGELO, M.; and L. C. DOS-SANTOS, W. PathoSpotter: Computational Intelligence Applied to Nephropathology. In: SPRINGER NATURE, 2022. Evidence-based medicine has received increasing attention. This type of medicine would have the benefit of using large data sets to investigate clinical–laboratory associations and validate hypotheses grounded on data. Pathology is one area that has been benefited from large data sets of images, having advances leveraged by computational pathology, which in turn relies in the advances of the methods conceived by the computational intelligence and the computer vision fields. This type of medicine would benefit of using large. By particularly considering kidney biopsies, computational nephropathology seeks to identify renal lesions from primary computer vision tasks that involve classification and segmentation of renal structures on histology images. In this context, this chapter aims at discussing some advances in computational nephropathology, contextualizing them in the scope of the PathoSpotter project. We also address current achievements and challenges, as well as dig in future prospects to the field.
2021
Journal ESTRELA, L.; OLIVEIRA, L.; A prototype of a termography equipment. In: IEEE Latin America Transactions, 2021. Abstract: Thermography is a technique for graphically recording the temperature of bodies above -273 oC and capable of emitting infrared radiation. This feature allows for the study of temperature behavior in different objects, structures, and surfaces over time. This work aims to describe the conception of an opensource thermographic equipment. This paper covers the stages of construction of the physical structure, data acquisition, movement system, electronics, control, electrical system, and graphical interface for equipment control and image formation. The proposed equipment has an accuracy of +/- 1 oC, at a temperature between 0 oC and 50 oC, a usable reading area of 20 x 22 cm, producing images with 32 x 36 pixels, and capable of reading objects with temperatures between 0 oC and 300 oC. The equipment proposed here works for studies of thermography in small bodies, such as the human hand, small objects with heat variation, electronic circuits, and components and portable devices (e.g. smartphones, lithium-ion batteries, tablets), just to cite a few. The thermal images produced by the proposed equipment have well-defined contours and uniform thermal characteristics.
Conference CHAGAS, P., G.; SOUZA, L.; CALUMBY, R.; PONTES I.; ARAÚJO S.; DUARTE A.; PINHEIRO N.; SANTOS W.; OLIVEIRA L. Toward unbounded open-set recognition to say "I don't know" for glomerular multi-lesion classification. In: International Symposium on Medical Information Processing and Analysis (SIPAIM), Campinas, 2021. Abstract: Glomeruli are histological structures located at the beginning of the nephrons in the kidney, having primary importance in the diagnosis of many renal diseases. Classifying glomerular lesions is time-consuming and requires experienced pathologists. Hence automatic classification methods can support pathologists in the diagnosis and decision-making scenarios. Recently most of state-of-the-art medical imaging classification methods have been based on deep-learning approaches, which are prone to return overconfident scores, even for out-of-distribution (OOD) inputs. Determining whether inputs are OOD samples is of underlying importance so as to ensure the safety and robustness of critical machine learning applications. Bearing this in mind, we propose a unified framework comprised of unbounded open-set recognition and multi-lesion glomerular classification (membranous nephropathy, glomerular hypercellularity, and glomerular sclerosis). Our proposed framework classifies the input into in- or OOD data: If the sample is an OOD image, the input is disregarded, indicating that the model \doesn’t know” the class; otherwise, if the sample is classified as in-distribution, an uncertainty method based on Monte-Carlo dropout is used for multi-lesion classification. We explored an energy-based approach that allows open-set recognition without fine-tuning the in-distribution weights to specific OOD data. Ultimately, our results suggest that uncertainty estimation methods (Monte-Carlo dropout, test-time data augmentation, and ensemble) combined with energy scores slightly improved our open-set recognition for in-out classification. Our results also showed that this improvement was achieved without decreasing the 4-lesion classification performance, with an F1-score of 0.923. Toward an unbounded open-set glomerular multi-lesion recognition, the proposed method also kept a competitive performance.
Conference LEFUNDES, G.; OLIVEIRA, L.; Gaze estimation via self-attention augmented convolutions. In: Conference on Graphics, Patterns and Images (SIBGRAPI), Gramado, 2021. Abstract: Although recently deep learning methods have boosted the accuracy of appearance-based gaze estimation, there is still room for improvement in the network architectures for this particular task. Hence we propose here a novel network architecture grounded on self-attention augmented convolutions to improve the quality of the learned features during the training of a shallower residual network. The rationale is that self-attention mechanism can help outperform deeper architectures by learning dependencies between distant regions in full-face images. This mechanism can also create better and more spatially-aware feature representations derived from the face and eye images before gaze regression. We dubbed our framework ARes-gaze, which explores our Attention-augmented ResNet (ARes-14) as twin convolutional backbones. In our experiments, results showed a decrease of the average angular error by 2.38% when compared to state-of-the-art methods on the MPIIFaceGaze data set, while achieving a second-place on the EyeDiap data set. It is noteworthy that our proposed framework was the only one to reach high accuracy simultaneously on both data sets.
Conference CERQUEIRA, S.; AGUIAR, E.; DUARTE, A.; DOS SANTOS, W.; OLIVEIRA, L.; ÂNGELO, M. PathoSpotter Classifier: Uma Serviço Web para Auxílio à Classificação de Lesões em Glomérulos Renais. In: Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS), 2021. Abstract: In recent years, the PathoSpotter Project has developed and perfected classifiers to aid in the diagnosis of lesions in digital images of renal biopsies. Among the goals of the project is the availability of these classifiers so that pathologists can use them to facilitate their medical practice and also contribute to the improvement of the system. This work presents the architecture of the PathoSpotter Classifier, the Web service created by the PathoSpotter Project, and how the challenges faced in distributing the system for use by pathologists were overcome.
Conference CHAGAS, P.; SOUZA, L.; CALUMBY, R.; DUARTE, A; ANGELO, M.; SANTOS, W.; OLIVEIRA, L. Deep-learning-based membranous nephropathy classification and Monte-Carlo dropout uncertainty estimation. In: Simpósio Brasileiro de Computação Aplicada à Saúde (SBCAS), 2021. Abstract: Membranous Nephropathy (MN) is one of the most common glomerular diseases that cause adult nephrotic syndrome. To assist pathologists on MN classification, we evaluated three deep-learning-based architectures, namely, ResNet-18, DenseNet and Wide-ResNet. In addition, to accomplish more reliable results, we applied Monte-Carlo Dropout for uncertainty estimation. We achieved average F1-Scores above 92% for all models, with Wide-ResNet obtaining the highest average F1-Score (93.2%). For uncertainty estimation on Wide-ResNet, the uncertainty scores showed high relation with incorrect classifications, proving that these uncertainty estimates can support pathologists on the analysis of model predictions.
Conference BARROS, J.; OLIVEIRA, L. Deep Speed Estimation from Synthetic and Monocular Data. In: IEEE International Symposium on Intelligent Vehicle, 2021. Abstract: Current state-of-the-art in speed measurement technologies includes magnetic inductive loop detectors, Doppler radar, infrared sensors, and laser sensors. Many of these systems rely on intrusive methods that require intricate installation and maintenance processes that hinder traffic while leading to high acquisition and maintenance costs. Speed measurement from monocular videos appears as an alternative in this context. However, most of these systems present as a drawback the requirement of camera calibration – a fundamental step to convert the vehicle speed from pixels per frame to some real-world unit of measurement (e.g. km/h). Considering that, we propose a speed measurement system based on monocular cameras with no need for calibration. Our proposed system was trained from a synthetic data set containing 12,290 instances of vehicle speeds. We extract the motion information of the vehicles that pass in a specific region of the image by using dense optical flow, using it as input to a regressor based on a customized VGG-16 network. The performance of our method was evaluated over the Luvizon’s data set, which contains real-world scenarios with 7,766 vehicle speeds, groundtruthed by a high precision system based on properly calibrated and approved inductive loop detectors. Our proposed system was able to measure 85.4% of the speed instances within an error range of [-3, + 2] km/h, which is ideally defined by the regulatory authorities in several countries. Our proposed system does not rely on any distance measurements in the real world as input, eliminating the need for camera calibration.
Conference PINHEIRO L.; SILVA B.; SOBRINHO B.; LIMA F.; CURY P.; OLIVEIRA L. Numbering permanent and deciduous teeth via deep instance segmentation in panoramic X-rays. In: International Symposium on Medical Information Processing and Analysis (SIPAIM), Campinas, 2021. Abstract: Panoramic X-rays are an essential tool to assist dentistry experts in their diagnostic procedures. Dentists can analyze the anatomical and pathological structures while planing orthodontic, periodontal, and surgical treatments. Even though detecting, numbering, and segmenting teeth are essential tasks to leverage automatic analysis on panoramic X-rays, it is lacking in the literature a study and a data set that considers at the same time deciduous and permanent teeth in a wide variety of panoramic X-rays. To fill this gap, this work introduces a novel, challenging, and high-variable public data set labeled from scratch. This data set incorporates new elements such as instance overlapping and deciduous teeth, supporting our study on tooth numbering and segmentation. Our efforts aim to improve the segmentation on the boundaries because they are the main hurdle of the instance segmentation methods. For that, we investigate and compare (quantitatively and qualitatively) two Mask R-CNN-based solutions: the standard one, with a fully convolutional network, and another one that employs the PointRend module on the top. Our findings attest to the feasibility of extending segmentation and numbering to deciduous teeth through end-to-end deep learning architectures, as well as, the higher performance of the Mask R-CNN with PointRend either on instance segmentation (mAP of +2 percentage points) or the numbering (mAP of +1.2 percentage points) on the test data set. We hope that our findings and our new data set support the development of new tools to assist professionals in faster diagnosis, making upon panoramic X-rays.
Conference CARDOZO, J.; DOS-SANTOS, WL; DUARTE, A.; OLIVEIRA, L.; ANGELO, M. Automatic Glomerulus Detection in Renal Histological Images . In: Proceedings Volume 11603, SPIE Medical Imaging 2021: Digital Pathology. Abstract: Glomeruli are microscopic structures of the kidney affected in many renal diseases. The diagnosis of these diseases depends on the study by a pathologist of each glomerulus sampled by renal biopsy. To help pathologists with the image analysis, we propose a glomerulus detection method on renal histological images. For that, we evaluated two state-of-the-art deep-learning techniques: single shot multibox detector with Inception V2 (SI2) and faster region-based convolutional neural network with Inception V2 (FRI2). As a result, we reached: 0.88 of mAP and 0.94 of F1-score, when using SI2, and 0.87 of mAP and 0.97 of F1-score, when using FRI2. On average, to process each image, FRI2 required 30.91s, while SI2 just 0.79s. In our experiments, we found that SI2 model is the best detection method for our task since it is 64% faster in the training stage and 98% faster to detect the glomeruli in each image.
2020
Journal CHAGAS, P.; SOUZA, L.; ARAÚJO, I.; ALDEMAN, N.; DUARTE, A.; ANGELO, M.; DOS-SANTOS, WL; OLIVEIRA, L. Classification of glomerular hypercellularity using convolutional features and support vector machine. Artificial Intelligence in Medicine. 2020 Mar 1;103:101808. Abstract: Glomeruli are histological structures of the kidney cortex formed by interwoven blood capillaries, and are responsible for blood filtration. Glomerular lesions impair kidney filtration capability, leading to protein loss and metabolic waste retention. An example of lesion is the glomerular hypercellularity, which is characterized by an increase in the number of cell nuclei in different areas of the glomeruli. Glomerular hypercellularity is a frequent lesion present in different kidney diseases. Automatic detection of glomerular hypercellularity would accelerate the screening of scanned histological slides for the lesion, enhancing clinical diagnosis. Having this in mind, we propose a new approach for classification of hypercellularity in human kidney images. Our proposed method introduces a novel architecture of a convolutional neural network (CNN) along with a support vector machine, achieving near perfect average results on FIOCRUZ data set in a binary classification (lesion or normal). Additionally, classification of hypercellularity sub-lesions was also evaluated, considering mesangial, endocapilar and both lesions, reaching an average accuracy of 82%. Either in binary task or in the multi-classification one, our proposed method outperformed Xception, ResNet50 and InceptionV3 networks, as well as a traditional handcrafted-based method. To the best of our knowledge, this is the first study on deep learning over a data set of glomerular hypercellularity images of human kidney.
Conference SILVA, B.; PINHEIRO, L.; OLIVEIRA, L.; PITHON, M. A study on tooth segmentation and numbering using end-to-end deep neural networks. In: Conference on Graphics, Patterns and Images (SIBGRAPI), Porto de Galinhas, 2020. Abstract: Shape, number, and position of teeth are the main targets of a dentist when screening for patient’s problems on X-rays. Rather than solely relying on trained eyes of the dentists, computational tools have been proposed to aid specialists as decision supporter for better diagnoses. When applied to X-rays, these tools are specially grounded on object segmentation and detection and have the goal of highlighting the teeth in the images to facilitate other automatic methods in further processing steps. Although researches over tooth segmentation and detection have not come out recently, the application of deep learning techniques in the field is new and has not reached maturity yet. To fill some gaps in the area of dental image analysis, we bring a thorough study on tooth segmentation and numbering on panoramic X-rays images through the use of end-to-end deep neural networks. For that, we analyze the performance of four network architectures, namely, Mask R-CNN, PANet, HTC, and ResNeSt, over a challenging data set. The choice of these networks was made upon their high performance over other data sets for instance segmentation and detection. To the best of our knowledge, this is the first study on instance segmentation, detection, and numbering of teeth on panoramic dental X-rays. We found that (i) it is completely feasible to detect, segment and number teeth by through any of the analyzed architectures, (ii) performance can be significantly boosted with the proper choice of a neural network architecture, and (iii) the PANet had the best results on our evaluations with an mAP of 71.3% on segmentation and 74.0% on numbering, raising 4.9 and 3.5 percentage points the results obtained with Mask R-CNN.
Journal CERQUEIRA, R.; TROCOLI, T.; ALBIEZ, J.; OLIVEIRA, L. A rasterized ray-tracer pipeline for real-time, multi-device sonar simulation. In: Elsevier Graphical Models, 2020. Abstract: Image segmentation is the task of assigning a label to each image pixel. When the number of labels is greater than two (multi-label) the segmentation can be modelled as a multi-cut problem in graphs. In the general case, finding the minimum cut in a graph is an NP-hard problem, in which improving the results concerning time and quality is a major challenge. This paper addresses the multi-label problem applied in interactive image segmentation. The proposed approach makes use of dynamic programming to initialize an α-expansion, thus reducing its runtime, while keeping the Dice-score measure in an interactive segmentation task. Over BSDS data set, the proposed algorithm was approximately 51.2% faster than its standard counterpart, 36.2% faster than Fast Primal-Dual (FastPD) and 10.5 times faster than quadratic pseudo-boolean optimization (QBPO) optimizers, while preserving the same segmentation quality.
Conference ASCENSAO, N.; AFONSO, L.; COLOMBO, D.; Oliveira, L.; PAPA, J. P. Information Ranking Using Optimum-Path Forest. In: IEEE World Congress on Computational Intelligence, 2020, Glasgow. International Joint Conference on Neural Network, 2020. Abstract: Image segmentation is the task of assigning a label to each image pixel. When the number of labels is greater than two (multi-label) the segmentation can be modelled as a multi-cut problem in graphs. In the general case, finding the minimum cut in a graph is an NP-hard problem, in which improving the results concerning time and quality is a major challenge. This paper addresses the multi-label problem applied in interactive image segmentation. The proposed approach makes use of dynamic programming to initialize an α-expansion, thus reducing its runtime, while keeping the Dice-score measure in an interactive segmentation task. Over BSDS data set, the proposed algorithm was approximately 51.2% faster than its standard counterpart, 36.2% faster than Fast Primal-Dual (FastPD) and 10.5 times faster than quadratic pseudo-boolean optimization (QBPO) optimizers, while preserving the same segmentation quality.
Conference FONTINELE, J.; MENDONÇA, M.; RUIZ, M.; PAPA, J.; OLIVEIRA, L. Faster α-expansion via dynamic programming and image partitioning. IEEE World Congress on Computational Intelligence, 2020, Glasgow. International Joint Conference on Neural Network, 2020. Abstract: Image segmentation is the task of assigning a label to each image pixel. When the number of labels is greater than two (multi-label) the segmentation can be modelled as a multi-cut problem in graphs. In the general case, finding the minimum cut in a graph is an NP-hard problem, in which improving the results concerning time and quality is a major challenge. This paper addresses the multi-label problem applied in interactive image segmentation. The proposed approach makes use of dynamic programming to initialize an α-expansion, thus reducing its runtime, while keeping the Dice-score measure in an interactive segmentation task. Over BSDS data set, the proposed algorithm was approximately 51.2% faster than its standard counterpart, 36.2% faster than Fast Primal-Dual (FastPD) and 10.5 times faster than quadratic pseudo-boolean optimization (QBPO) optimizers, while preserving the same segmentation quality.
2019
Journal ARAÚJO, POMPÍLIO; FONTINELE, JEFFERSON; OLIVEIRA, L. Multi-perspective object detection for remote criminal analysis using drones. In: IEEE Geoscience and Remote Sensing Letters, 2019. Abstract: When a crime is committed, the associated site must be preserved and reviewed by a criminal expert. Some tools are commonly used to ensure the total registration of the crime scene with minimal human interference. As a novel tool, we propose here an intelligent system that remotely recognizes and localizes objects considered as important evidences at a crime scene. Starting from a general viewpoint of the scene, a drone system defines trajectories through which the aerial vehicle performs a detailed search to record evidences. A multiperspective detection approach is introduced by analyzing several images of the same object in order to improve the reliability of the object recognition. To our knowledge, it is the first work on remote autonomous sensing of crime scenes. Experiments showed an accuracy increase of 18.2 percentage points, when using multiperspective detection.
Conference BARBOSA, L.; DAHIA, G.; PAMPLONA, M. Expression removal in 3D faces for recognition purposes. In: Brazilian Conference on Intelligent Systems, 2019. Abstract: We present an encoder-decoder neural network to remove deformations caused by expressions from 3D face images. It receives a 3D face with or without expressions as input and outputs its neutral form. Our objective is not to obtain the most realistic results but to enhance the accuracy of 3D face recognition systems. To this end, we propose using a recognition-based loss function during training so that our network can learn to maintain important identity cues in the output. Our experiments using the Bosphorus 3D Face Database show that our approach successfully reduces the difference between face images from the same subject affected by different expressions and increases the gap between intraclass and interclass difference values. They also show that our synthetic neutral images improved the results of four different well-known face recognition methods.
Conference Emeršič et al. The Unconstrained Ear Recognition Challenge 2019. In: IAPR International Conference on Biometrics, 2019. Abstract: This paper presents a summary of the 2019 Unconstrained Ear Recognition Challenge (UERC), the second in a series of group benchmarking efforts centered around the problem of person recognition from ear images captured in uncontrolled settings. The goal of the challenge is to assess the performance of existing ear recognition techniques on a challenging large-scale ear dataset and to analyze performance of the technology from various viewpoints, such as generalization abilities to unseen data characteristics, sensitivity to rotations, occlusions and image resolution and performance bias on sub-groups of subjects, selected based on demographic criteria, i.e. gender and ethnicity. Research groups from 12 institutions entered the competition and submitted a total of 13 recognition approaches ranging from descriptor-based methods to deep-learning models. The majority of submissions focused on ensemble based methods combining either representations from multiple deep models or hand-crafted with learned image descriptors. Our analysis shows that methods incorporating deep learning models clearly outperform techniques relying solely on hand-crafted descriptors, even though both groups of techniques exhibit similar behaviour when it comes to robustness to various covariates, such presence of occlusions, changes in (head) pose, or variability in image resolution. The results of the challenge also show that there has been considerable progress since the first UERC in 2017, but that there is still ample room for further research in this area.
Journal MINETTO, R.; PAMPLONA, M.; SARKAR, S. Hydra: An Ensemble of Convolutional Neural Networks for Geospatial Land Classification. In: IEEE Transactions on Geoscience and Remote Sensing, 2019. Abstract: In this paper, we describe Hydra, an ensemble of convolutional neural networks (CNNs) for geospatial land classification. The idea behind Hydra is to create an initial CNN that is coarsely optimized but provides a good starting pointing for further optimization, which will serve as the Hydra’s body. Then, the obtained weights are fine-tuned multiple times with different augmentation techniques, crop styles, and classes weights to form an ensemble of CNNs that represent the Hydra’s heads. By doing so, we prompt convergence to different endpoints, which is a desirable aspect for ensembles. With this framework, we were able to reduce the training time while maintaining the classification performance of the ensemble. We created ensembles for our experiments using two state-of-the-art CNN architectures, residual network (ResNet), and dense convolutional networks (DenseNet). We have demonstrated the application of our Hydra framework in two data sets, functional map of world (FMOW) and NWPU-RESISC45, achieving results comparable to the state-of-the-art for the former and the best-reported performance so far for the latter. Code and CNN models are available at https://github.com/maups/hydra-fmow.
Journal NEVES, G.; RUIZ, M.; FONTINELE, J.; OLIVEIRA, L. Rotated object detection with forward-looking sonar in underwater applications. In: Elsevier Expert Systems with Applications, 2019. Abstract: Autonomous underwater vehicles (AUVs) are often used to inspect the condition of submerged structures in oil and gas fields. Because the use of global positioning systems to aid AUV navigation is not feasible, object detection is an alternative method of supporting underwater inspection missions by detecting landmarks. Objects are detected not only to plan the trajectory of the AUVs, but their inspection can be the ultimate goal of the mission. In both cases, detecting an object’s distance and orientation with respect to the AUV provides clues for the vehicle’s navigation. Accordingly, we introduce a novel multi-object detection system that outputs object position and rotation from sonar images to support AUV navigation. To achieve this aim, two novel convolutional neural network-based architectures are proposed to detect and estimate rotated bounding boxes: an end-to-end network (RBoxNet), and a pipeline comprised of two networks (YOLOv2+RBoxDNet). Both proposed networks are structured from one of three novel representations of rotated bounding boxes regressed deep inside. Experimental analyses were performed by comparing several configurations of our proposed methods (by varying the backbone, regression representation, and architecture) with state-of-the-art methods using real sonar images. Results showed that RBoxNet presents the optimum trade-off between accuracy and speed, reaching an averaged mAP@[.5,.95] of 90.3% at 8.58 frames per second (FPS), while YOLOv2+RBoxDNet is the fastest solution, running at 16.19 FPS but with a lower averaged mAP@[.5,.95] of 77.5%. Both proposed methods are robust to additive Gaussian noise variations, and can detect objects even when the noise level is up to 0.10.
Journal ARAUJO JR., P.; MENDONÇA, M; OLIVEIRA, L. Towards Autonomous Investigation of Crime Scene by Using Drones. In: Sensors & Transducers, 2019. Abstract: A location associated with a committed crime must be preserved, even before criminal experts start collecting and analyzing evidences. Indeed, crime scenes should be recorded with minimal human interference. In order to help specialists to accomplish this task, we propose an autonomous system for investigation of a crime scene using a drone. Our proposed autonomous system recognizes objects considered as important evidence at a crime scene, defining the trajectories through which a drone performs a detailed search. We used our previously proposed method, called Air- SSLAM, to estimate drone’s pose, as well as proportional-integral-derivative controllers for aircraft stabilization. The goal is to make the drone fly through the paths defined by the objects recognized across the scene. At the end, the proposed system outputs a report containing a list of evidences, sketches, images and videos collected during the investigation. The performance of our system is assessed from a simulator, and a real- life drone system is being prepared to reach the goal.
Journal ABDALLA, K; MENEZES, I.; OLIVEIRA, L. Modelling perceptions on the evaluation of video summarization. In: Elsevier Expert Systems with Applications, 2019. Abstract: Hours of video are uploaded to streaming platforms every minute, with recommender systems suggest- ing popular and relevant videos that can help users save time in the searching process. Recommender systems regularly require video summarization as an expert system to automatically identify suitable video entities and events. Since there is no well-established methodology to evaluate the relevance of summarized videos, some studies have made use of user annotations to gather evidence about the effec- tiveness of summarization methods. Aimed at modelling the user’s perceptions, which ultimately form the basis for testing video summarization systems, this paper seeks to propose: (i) A guideline to collect unrestricted user annotations, (ii) a novel metric called compression level of user annotation (CLUSA) to gauge the performance of video summarization methods, and (iii) a study on the quality of annotated video summaries collected from different assessment scales. These contributions lead to benchmarking video summarization methods with no constraints, even if user annotations are collected from different assessment scales for each method. Our experiments showed that CLUSA is less susceptible to unbalanced compression data sets in comparison to other metrics, hence achieving higher reliability estimates. CLUSA also allows to compare results from different video summarizing approaches.
Conference ARAUJO JR., P.; MENDONÇA, M; OLIVEIRA, L. AirCSI – Remotely Criminal Investigator. In: International Conference on Advances in Signal Processing and Artificial Intelligence (ASPAI'2019), Barcelona, Spain, 2019. Abstract: Once a location associated with a committed crime must be preserved, even before criminal experts start collecting and analyzing evidences, the crime scene should be recorded with minimal human interference. In this work, we introduce an autonomous system for investigation of crime scene using a drone. Our proposed intelligent system recognizes objects considered as important evidence of the crime scene, and defines the trajectories through which the drone performs a detailed search to record evidences of the scene. We used our own method, called Air-SSLAM, to estimate drone’s pose, as well as proportional–integral–derivative (PID) controllers for aircraft stabilization, while flying through the paths defined by the environment recognition step. We evaluated the performance of our system in a simulator, also preparing a real-drone system to work in a real environment.
Conference RUIZ, M.; FONTINELE, J.; PERRONE, R.; SANTOS, M.; OLIVEIRA, L. A Tool for Building Multi-purpose and Multi-pose Synthetic Data Sets. In: ECCOMAS THEMATIC CONFERENCE ON COMPUTATIONAL VISION AND MEDICAL IMAGE PROCESSING, Lecture Notes in Computational Vision and Biomechanics, 2019. Abstract: Modern computer vision methods typically require expensive data acquisition and accurate manual labeling. In this work, we instead leverage the recent progress in computer graphics to propose a novel approach of designing and generating large scale multi-purpose image data sets from 3D object models directly, captured from multiple categorized camera viewpoints and controlled environmental conditions. The set of rendered images provide data for geometric computer vision problems such as depth estimation, camera pose estimation, 3D box estimation, 3D reconstruction, camera calibration, and also pixel-perfect ground truth for scene understanding problems, such as: semantic and instance segmentation, object detection, just to cite a few. In this paper, we also survey the most well-known synthetic data sets used in computer vision tasks, pointing out the relevance of rendering images for training deep neural networks. When compared to similar tools, our generator contains a wide set of features easy to extend, besides allowing for building sets of images in the MSCOCO format, so ready for deep learning works. To the best of our knowledge, the proposed tool is the first one to generate large-scale, multi-pose, synthetic data sets automatically, allowing for training and evaluation of supervised methods for all of the covered features.
2018
Conference ROTICH, G.; AAKUR, S.; MINETTO, R.; PAMPLONA, M.; SARKAR, S. Continuous Biometric Authentication using Possibilistic C-Means. In: IEEE International Conference on Fuzzy Systems, 2018. Abstract: We propose a continuous biometric authentication framework that uses the Possibilistic C-Means (PCM) algorithm to guarantee that only authorized users can access a protected system. PCM is employed to cluster a history of biometric samples in two classes: genuine and impostor. The degree of membership of the current biometric sample to those classes is then used as a score, which is fused over time to reach a decision regarding the safety of the system. The main advantage of our approach is that it is training-free, and thus is applicable to any biometric feature that can be captured continuously without modification. We evaluated our system using 2D, 3D and NIR videos of faces and achieved results comparable to a training-based state-of-art work.
Conference ROTICH, G.; AAKUR, S.; MINETTO, R.; PAMPLONA, M.; SARKAR, S. Using Semantic Relationships among Objects for Geospatial Land Use Classification. In: IEEE Applied Imagery Pattern Recognition Workshop, 2019. Abstract: The geospatial land recognition is often cast as a local-region based classification problem. We show in this work, that prior knowledge, in terms of global semantic relationships among detected regions, allows us to leverage semantics and visual features to enhance land use classification in aerial imagery. To this end, we first estimate the top-k labels for each region using an ensemble of CNNs called Hydra. Twelve different models based on two state-of-the-art CNN architectures, ResNet and DenseNet, compose this ensemble. Then, we use Grenander’s canonical pattern theory formalism coupled with the common-sense knowledge base, ConceptNet, to impose context constraints on the labels obtained by deep learning algorithms. These constraints are captured in a multi-graph representation involving generators and bonds with a flexible topology, unlike an MRF or Bayesian networks, which have fixed structures. Minimizing the energy of this graph representation results in a graphical representation of the semantics in the given image. We show our results on the recent fMoW challenge dataset. It consists of 1,047,691 images with 62 different classes of land use, plus a false detection category. The biggest improvement in performance with the use of semantics was for false detections. Other categories with significantly improved performance were: zoo, nuclear power plant, park, police station, and space facility. For the subset of fMow images with multiple bounding boxes the accuracy is 72.79% without semantics and 74.06% with semantics. Overall, without semantic context, the classification performance was 77.04%. With semantics, it reached 77.98%. Considering that less than 20% of the dataset contained more than one ROI for context, this is a significant improvement that shows the promise of the proposed approach.
Conference HANSLEY, EE.; PAMPLONA, M.; SARKAR, S. Employing fusion of learned and handcrafted features for unconstrained ear recognition. In: IET Biometrics, 2018. Abstract: We present an unconstrained ear recognition framework that outperforms state-of-the-art systems in different publicly available image databases. To this end, we developed CNN-based solutions for ear normalization and description, we used well-known handcrafted descriptors, and we fused learned and handcrafted features to improve recognition. We designed a two-stage landmark detector that successfully worked under untrained scenarios. We used the results generated to perform a geometric image normalization that boosted the performance of all evaluated descriptors. Our CNN descriptor outperformed other CNN-based works in the literature, specially in more difficult scenarios. The fusion of learned and handcrafted matchers appears to be complementary as it achieved the best performance in all experiments. The obtained results outperformed all other reported results for the UERC challenge, which contains the most difficult database nowadays.
Journal SANTOS, M.; OLIVEIRA, L. ISEC: Iterative over-Segmentation via Edge Clustering. In: Elsevier Image and Vision Computing, 2018. Abstract: Several image pattern recognition tasks rely on superpixel generation as a fundamental step. Image analysis based on superpixels facilitates domain-specific applications, also speeding up the overall processing time of the task. Recent superpixel methods have been designed to fit boundary adherence, usually regulating the size and shape of each superpixel in order to mitigate the occurrence of undersegmentation failures. Superpixel regularity and compactness sometimes imposes an excessive number of segments in the image, which ultimately decreases the efficiency of the final segmentation, specially in video segmentation. We propose here a novel method to generate superpixels, called iterative over-segmentation via edge clustering (ISEC), which addresses the over-segmentation problem from a different perspective in contrast to recent state-of-the-art approaches. ISEC iteratively clusters edges extracted from the image objects, providing adaptive superpixels in size, shape and quantity, while preserving suitable adherence to the real object boundaries. All this is achieved at a very low computational cost. Experiments show that ISEC stands out from existing methods, meeting a favorable balance between segmentation stability and accurate representation of motion discontinuities, which are features specially suitable to video segmentation.
Conference JADER, G.; FONTINELE, J.; RUIZ, M.; ABDALLA, K.; PITHON, M.; OLIVEIRA, L. Deep instance segmentation of teeth in panoramic X-ray images. In: Conference on Graphics, Patterns and Images (SIBGRAPI'2018), Foz do Iguaçu, 2018. Abstract: In dentistry, radiological examinations help specialists by showing structure of the tooth bones with the goal of screening embedded teeth, bone abnormalities, cysts, tumors, infections, fractures, problems in the temporomandibular regions, just to cite a few. Sometimes, relying solely in the specialist’s opinion can bring differences in the diagnoses, which can ultimately hinder the treatment. Although tools for complete automaticdiagnosis are no yet expected, image pattern recognition has evolved towards decision support, mainly starting with the detection of teeth and their components in X-ray images. Tooth detection has been object of research during at least the last two decades, mainly relying in threshold and region-based methods. Following a different direction, this paper proposes to explore a deep learning method for instance segmentation of the teeth. To the best of our knowledge, it is the first system that detects and segment each tooth in panoramic X-ray images. It is noteworthy that this image type is the most challenging one to isolate teeth, since it shows other parts of patient’s body (e.g., chin, spine and jaws). We propose a segmentation system based on mask regionbased convolutional neural network to accomplish an instance segmentation. Performance was thoroughly assessed from a 1500 challenging image data set, with high variation and containing 10 categories of different types of buccal image. By training the proposed system with only 193 images of mouth containing 32 teeth in average, using transfer learning strategies, we achieved 98% of accuracy, 88% of F1-score, 94% of precision, 84% of recall and 99% of specificity over 1224 unseen images, results very superior than other 10 unsupervised methods.
Journal SOUZA, L.; OLIVEIRA, L.; PAMPLONA, M.; PAPA, J. How far did we get in face spoofing detection?. In: Elsevier Engineering Applications of Artificial Intelligence, 2018. Abstract: The growing use of control access systems based on face recognition shed light over the need for even more accurate systems to detect face spoofing attacks. In this paper, an extensive analysis on face spoofing detection works published in the last decade is presented. The analyzed works are categorized by their fundamental parts, i.e., descriptors and classifiers. This structured survey also brings a comparative performance analysis of the works considering the most important public data sets in the field. The methodology followed in this work is particularly relevant to observe temporal evolution of the field, trends in the existing approaches, to discuss still opened issues, and to propose new perspectives for the future of face spoofing detection.
Journal SILVA, G.; OLIVEIRA, L.; PITHON, M. Automatic segmenting teeth in X-ray images: Trends, a novel data set, benchmarking and future perspectives. In: Elsevier Expert Systems with Applications, 2018. Abstract: This review presents an in-depth study of the literature on segmentation methods applied in dental imaging. Several works on dental image segmentation were studied and categorized according to the type of method (region-based, threshold-based, cluster-based, boundary-based or watershed-based), type of X-ray images analyzed (intra-oral or extra-oral), and characteristics of the data set used to evaluate the methods in each state-of-the-art work. We found that the literature has primarily focused on threshold-based segmentation methods (54%). 80% of the reviewed articles have used intra-oral X-ray images in their experiments, demonstrating preference to perform segmentation on images of already isolated parts of the teeth, rather than using extra-oral X-rays, which also show tooth structure of the mouth and bones of the face. To fill a scientific gap in the field, a novel data set based on extra-oral X-ray images, presenting high variability and with a large number of images, is introduced here. A statistical comparison of the results of 10 pixel-wise image segmentation methods over our proposed data set comprised of 1,500 images is also carried out, providing a comprehensive source of performance assessment. Discussion on limitations of the benchmarked methods, as well as future perspectives on exploiting learning-based segmentation methods to improve performance, is also addressed. Finally, we present a preliminary application of the MASK recurrent convolutional neural network to demonstrate the power of a deep learning method to segment images from our data set.
2017
Journal CERQUEIRA, R.; TROCOLI, T.; NEVES, G.; JOYEUX, S.; ALBIEZ, J.; OLIVEIRA, L. A novel GPU-based sonar simulator for real-time applications. In: Elsevier Computers and Graphics, 2017. Abstract: Mainly when applied in the underwater environment, sonar simulation requires great computational effort due to the complexity of acoustic physics. Simulation of sonar operation allows evaluating algorithms and control systems without going to the real underwater environment; that reduces the costs and risks of in-field experiments. This paper tackles with the problem of real-time underwater imaging sonar simulation by using the OpenGL shading language chain on GPU. Our proposed system is able to simulate two main types of acoustic devices: mechanical scanning imaging sonars and forward-looking sonars. The underwater scenario simulation is performed based on three frameworks: (i) OpenSceneGraph reproduces the ocean visual effects, (ii) Gazebo deals with physical forces, and (iii) the Robot Construction Kit controls the sonar in underwater environments. Our system exploits the rasterization pipeline in order to simulate the sonar devices, which are simulated by means of three parameters: the pulse distance, the echo intensity and the sonar field-of-view, being all calculated over observable objects shapes in the 3D rendered scene. Sonar-intrinsic operational parameters, speckle noise and object material properties are also considered as part of the acoustic image. Our evaluation demonstrated that the proposed system is able to operate close to or faster than the real-world devices. Also, our method generates visually realistic sonar images when compared with real-world sonar images of the same scenes.
Journal ARAÚJO, POMPÍLIO; MIRANDA, RODOLFO; CARMO, DIEDRE; ALVES, RAUL; OLIVEIRA, L. Air-SSLAM: A visual stereo indoor SLAM for aerial quadrotors. In: IEEE Geoscience and Remote Sensing Letters, 2017. Abstract: In this letter, we introduce a novel method for visual simultaneous localization and mapping (SLAM) – so-called Air-SSLAM –, which exploits a stereo camera configuration. In contrast to monocular SLAM, scale definition and 3D information are issues that can be more easily dealt with in stereo cameras. Air-SSLAM starts from computing keypoints and the correspondent descriptors over the pair of images, using good features-to-track and rotated-binary robust independent elementary features, respectively. Then a map is created by matching each pair of right and left frames. The long-term map maintenance is continuously performed by analyzing the quality of each matching, as well as by inserting new keypoints into uncharted areas of the environment. Three main contributions can be highlighted in our method: (i) a novel method to match keypoints efficiently, (ii) three quality indicators with the aim of speeding up the mapping process, and (iii) map maintenance with uniform distribution performed by image zones. By using a drone equipped with a stereo camera, flying indoor, the translational average error with respect to a marked ground truth was computed, demonstrating promising results.
Conference CARMO, DIEDRE; ALVES, RAUL; OLIVEIRA, L. Face identification based on synergism of classifiers in rectified stereo images. In: Workshop of Undergraduate Works (SIBGRAPI'2017), Niteroi, 2017. Abstract: This paper proposes a method to identify faces from a stereo camera. Our approach tries to avoid common problems that come with using only one camera that shall arise while detecting from a relatively unstable view in real world applications. The proposed approach exploits the use of a local binary pattern (LBP) to describe the faces in each image of the stereo camera, after detecting the face using the Viola-Jones’method. LBP histogram feeds then multilayer perceptron (MLP) and support vector machines (SVM) classifiers to identify the faces detected in each stereo image, considering a database of target faces. Computational cost problems due to the use of dual cameras are alleviated with the use of co-planar rectified images, achieved through calibration of the stereo camera. Performance is assessed using the well established Yale face dataset, and performance is assessed by using only one or both camera images.
Conference DAHIA, G. ; SANTOS, M. M. B. ; PAMPLONA SEGUNDO, M. A study of CNN outside of training conditions. In: IEEE International Conference on Image Processing (ICIP2017), Beijing, 2017. Abstract: Convolution neural networks (CNN) are the main development in face recognition in recent years. However, their description capacities have been somewhat understudied. In this paper, we show that training CNN only with color images is enough to properly describe depth and near infrared face images by assessing the performance of three publicly available CNN models on these other modalities. Furthermore, we find that, despite displaying results comparable to the human performance on LFW, not all CNN behave like humans recognizing faces in other scenarios.
2016
Conference TROCOLI, T.; OLIVEIRA, L. Using the scene to calibrate the camera. In: XIX Conference on Graphics, Patterns and Images (SIBGRAPI), Sao Jose dos Campos, 2016. 7 p. Abstract: Surveillance cameras are used in public and private security systems. Typical systems may contain a large number of different cameras, which are installed in different locations. Manual calibration of each single camera in the network becomes an exhausting task. Although we can find methods that semiautomatically calibrate a static camera, to the best of our knowledge, there is not a fully automatic calibration procedure, so far. To fill this gap, we propose here a novel framework for completely auto-calibration of static surveillance cameras, based on information of the scene (environment and walkers). Characteristics of the method include robustness to walkers’ pose and to camera location (pitch, roll, yaw and height), and rapid camera parameter convergence. For a thorough evaluation of the proposed method, the walkers’ foot-head projection, the length of the lines projected on the ground plane and the walkers’ real heights were analyzed over public and private data sets, demonstrating the potential of the proposed method.
Conference CERQUEIRA, R.; TROCOLI, T.; NEVES, G.; OLIVEIRA, L.; JOYEUX, S.; ALBIEZ, J. Custom Shader and 3D Rendering for computationally efficient Sonar Simulation. In: XIX Conference on Graphics, Patterns and Images (SIBGRAPI), Sao Jose dos Campos, 2016. 4 p. Abstract: This paper introduces a novel method for simulating underwater sonar sensors by vertex and fragment processing. The virtual scenario used is composed of the integration between the Gazebo simulator and the Robot Construction Kit (ROCK) framework. A 3-channel matrix with depth and intensity buffers and angular distortion values is extracted from OpenSceneGraph 3D scene frames by shader rendering, and subsequently fused and processed to generate the synthetic sonar data. To export and display simulation resources, this approach was written in C++ as ROCK packages. The method is evaluated on two use cases: the virtual acoustic images from a mechanical scanning sonar and forward-looking sonar simulations.
Journal FRANCO, A.; OLIVEIRA, L. Convolutional covariance features: Conception, integration and performance in person re-identification. In: Pattern Recognition, 2016. Abstract: This paper introduces a novel type of features based on covariance descriptors – the convolutional covariance features (CCF). Differently from the traditional and handcrafted way to obtain covariance descriptors, CCF is computed from adaptive and trainable features, which come from a coarse-to-fine transfer learning (CFL) strategy. CFL provides a generic-to-specific knowledge and noise-invariant information for person re-identification. After training the deep features, convolutional and flat features are extracted from, respectively, intermediate and top layers of a hybrid deep network. Intermediate layer features are then wrapped in covariance matrices, composing the so-called CCF, which are integrated to the top layer features, called here flat features. Integration of CCF and flat features demonstrated to improve the proposed person re-identification in comparison with the use of the component features alone. Our person re-identification method achieved the best top 1 performance, when compared with other 18 state-of-the-art methods over VIPeR, i-LIDS, CUHK01 and CUHK03 data sets. The compared methods are based on deep learning, covariance descriptors, or handcrafted features and similarity functions.
Conference FRANCO, A.; OLIVEIRA, L. A coarse-to-fine deep learning for person re-identification. In: IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, New York, 2017. Abstract: This paper proposes a novel deep learning architecture for person re-identification. The proposed network is based on a coarse-to-fine learning (CFL) approach, attempting to acquire a generic-to-specific knowledge throughout a transfer learning process. The core of the method relies on a hybrid network composed of a convolutional neural network and a deep belief network denoising autoenconder. This hybrid network is in charge of extracting features invariant to illumination varying, certain image deformations, horizontal mirroring and image blurring, and is embedded in the CFL architecture. The proposed network achieved the best results when compared with other state-of-the-arts methods on i-LIDS, CUHK01 and CUHK03 data sets, and also a competitive performance on VIPeR data set.
2015
Conference CARMO, D.; JOVITA, R.; FERRARI, R.; OLIVEIRA, L. A study on multi-view calibration methods for RGB-D cameras. In: Workshop of Undergraduate Works (SIBGRAPI'2015), Salvador, 2015. 6 p. Abstract: RGB-D cameras became part of our daily life in applications such as human-computer interface and game interaction, just to cite a few. Because of their easy programming interface and response precision, such cameras have also been increasingly used to 3D reconstruction and movement analysis. In view of that, calibration of multiple cameras is an essential task. On that account, the goal of this paper is to present a preliminary study of methods which tackle the problem of multi-view geometry computation using RGB-D cameras. A brief overview of camera geometry is presented, some methods of calibration are discussed and one of them is evaluated in practice; finally, some important points are addressed about practical issues involving the problem.
Conference CANÁRIO, J. P.; OLIVEIRA, L. Recognition of Facial Expressions Based on Deep Conspicuous Net. In: Iberoamerican Congress on Pattern Recognition, Salvador, 2015. 8 p. Abstract: Facial expression has an important role in human interaction and non-verbal communication. Hence more and more applications, which automatically detect facial expressions, start to be pervasive in various fields, such as education, entertainment, psychology, humancomputer interaction, behavior monitoring, just to cite a few. In this paper, we present a new approach for facial expression recognition using a so-called deep conspicuous neural network. The proposed method builds a conspicuousmap of region faces, training it via a deep network. Experimental results achieved an average accuracy of 90% over the extended Cohn-Kanade data set for seven basic expressions, demonstrating the best performance against four state-of-the-art methods.
Conference PAMPLONA SEGUNDO, M.; LEMES, P. R. Pore-based ridge reconstruction for fingerprint recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, 2015. 6 p. Abstract: The use of sweat pores in fingerprint recognition is becoming increasingly popular, mostly because of the wide availability of pores, which provides complementary information for matching distorted or incomplete images. In this work we present a fully automatic pore-based fingerprint recognition framework that combines both pores and ridges to measure the similarity of two images. To obtain the ridge structure, we propose a novel pore-based ridge reconstruction approach by considering a connect-the-dots strategy. To this end, Kruskal’s minimum spanning tree algorithm is employed to connect consecutive pores and form a graph representing the ridge skeleton. We evaluate our framework on the PolyU HRF database, and the obtained results are favorably compared to previous results in the literature.
Conference SANTOS, M.; OLIVEIRA, L. Context-supported Road Information for Background Modeling. In: XVIII Conference on Graphics, Patterns and Images (SIBGRAPI), Salvador, 2015. 8 p. Abstract: Background subtraction methods commonly suffers from incompleteness and instability over many situations. If one treats fast updating when objects run fast, it is not reliable to modeling the background while objects stop in the scene, as well; it is easy to find examples where the contrary is also true. In this paper we propose a novel method–designated Context-supported ROad iNformation (CRON) for unsupervised back-ground modeling, which deals with stationary foreground objects, while presenting a fast background updating. Differently from general-purpose methods, our method was specially conceived for traffic analysis, being stable in several challenging circumstances in urban scenarios. To assess the performance of the method, a thorough analysis was accomplished, comparing the proposed method with many others, demonstrating promising results in our favor.
Journal VIEIRA, J. P.; CARMO, D.; JOVITA, Y.; OLIVEIRA, L. A proposal of a non-intrusive, global movement analysis of hemiparesis treatment. In: Journal of Communication and Information Systems (Online), v. 30, n. 1, 2015. 11 p. Abstract: Hemiparesis is the most disabling condition after a stroke. Hemiparetic individuals suffer from a loss of muscle strength on one side of the body, resulting in a decreased capacity of performing movements. To assess the quality of Physiotherapy treatment, rating scales are commonly used but with the shortcoming of being subjective. With the aim of developing a system that objectively outcomes how a hemiparetic individual is responding to a Physiotherapy treatment, this paper proposes a method to analyze human functional movement by means of an apparatus comprised of multiple low-cost RGB-D cameras. After extrinsically calibrating the cameras, the setup system should be able to build a composite skeleton of the target patient, to globally analyze patient’s movement according to a reachable workspace and specific energy. These latter both a reproposed to be carried out by tracking the hand movements of the patient, and the movement volume produced. Here we present the concept of the proposed system, as well as, the idea of its parts. Index Terms: Movement volume; Hemiparesis; RGB-D cameras; kinect; specific energy; reachable workspace.
Conference NOBRE, T.; OLIVEIRA, L. Finger phalanx detection and tracking by contour analysis on RGB-D images. In: Workshop of Works in Progress (SIBGRAPI'2015), Salvador, 2015. 4 p. Abstract: In this paper we propose a method for identification of the finger phalanges based on the analysis of hand contour in RGB-D sensors. The proposed method is able to partially identify and track the kinematic structure of the fingers. The tracking was performed using the ORB algorithm to match points between a template with some hand images (in different poses) and the image captured. The principal component analysis was performed to compute the hand orientation relative to the image plane. The system will be used as a starting point for a full tracking of the fingers articulated movement.
2014
Conference LEMES, R. P.; PAMPLONA SEGUNDO, M.; BELLON, O. R. P.; SILVA, L. Dynamic Pore Filtering for Keypoint Detection Applied to Newborn Authentication. In: 22nd International Conference on Pattern Recognition (ICPR), Stockholm, 2014. 6 p. Abstract: We present a novel method for newborn authentication that matches keypoints in different interdigital regions from palmprints or footprints. Then, the method hierarchically combines the scores for authentication. We also present a novel pore detector for keypoint extraction, named Dynamic Pore Filtering (DPF), that does not rely on expensive processing techniques and adapts itself to different sizes and shapes of pores. We evaluated our pore detector using four different datasets. The obtained results of the DPF when using newborn dermatoglyphic patterns (2400ppi) are comparable to the state-of-the-art results for adult fingerprint images with 1200ppi. For authentication, we used four datasets acquired by two different sensors, achieving true acceptance rates of 91.53% and 93.72% for palmprints and footprints, respectively, with a false acceptance rate of 0%. We also compared our results to our previous approach on newborn identification, and we considerably outperformed its results, increasing the true acceptance rate from 71% to 98%.
Conference VIEIRA, J. P.; CARMO, D.; FERREIRA, R.; MIRANDA, J. G.; OLIVEIRA, L. Analysis of Human Activity By Specific Energy of Movement Volume in Hemiparetic Individuals. In: XVII Conference on Graphics, Patterns and Images (SIBGRAPI), Workshop on Vision-based Human Activity Recognition, Rio de Janeiro, 2014. 7 p. Abstract: Hemiparesis is the most disabling condition after a stroke. Hemiparetic individuals suffer from a loss of muscle strength on one side of the body, resulting in a decreased capacity of performing movements. To assess the quality of Physiotherapy treatment, rating scales are commonly used but with the drawback of being subjective.With the aim of developing a system that objectively outcomes how a hemiparetic individual is responding to a Physiotherapy treatment, this paper proposes a method to analyze human functional movement by means of an apparatus comprised of multiple low-cost RGB-D cameras. The idea is to first reconstruct the human body from multiple point of views, stitching them all, and, by isolating the movement of interest, track a movement volume and its specific energy in order to compare a “before” and “after” same activity. With that, we intend to avoid common problems referred to errors in the calculation of joints and angles. Here we present the concept of our system, as well as the idea of its parts.
Journal OLIVEIRA, L.; COSTA, V.; NEVES, G.; OLIVEIRA, T.; JORGE, E.; LIZARRAGA, M. A mobile, lightweight, poll-based food identification system. In: Pattern Recognition, v. 47, i. 5, p. 1941-1952, 2014. Abstract: Even though there are many reasons that can lead to people being overweight, experts agree that ingesting more calories than needed is one of them. But besides the appearance issue ,being overweight is actually a medical concern because it can seriously affect a person’s health. Losing weight then becomes an important goal, and one way to achieve it, is to burn more calories than ingested. The present paper addresses the problem of food identification based on image recognition as a tool for dietary assessment. To the best of our knowledge, this is the first system totally embedded into a camera - equipped mobile device, capable of identifying and classifying meals – that is, pictures which have multiple types of food placed on a plate. Considering the variability of the environment conditions, which the camera will be in, the identification process must be robust. It must also be fast, sustaining very low wait-times for the user. In this sense, we propose a novel approach, which integrates segmentation and learning on a multi-ranking framework. The segmentation is based on a modified region-growing method which runs over multiple feature spaces. These multiple segments feed support vector machines, which rank the most probable segment corresponding to a type of food. Experimental results demonstrate the effectiveness of the proposed method on a cellphone.
2013
Conference OLIVEIRA, L.; NUNES, U. Pedestrian detection based on LIDAR-driven sliding window and relational parts-based detection. In: IEEE Intelligent Vehicles Symposium, Gold Coast City, 2013. 6 p. Abstract: The most standard image object detectors are usually comprised of one or multiple feature extractors or classifiers within a sliding window framework. Nevertheless, this type of approach has demonstrated a very limited performance under datasets of cluttered scenes and real life situations. To tackle these issues, LIDAR space is exploited here in order to detect 2D objects in 3D space, avoiding all the inherent problems of regular sliding window techniques. Additionally, we propose a relational parts-based pedestrian detection in a probabilistic non-iid framework.With the proposed framework, we have achieved state-of-the-art performance in a pedestrian dataset gathered in a challenging urban scenario. The proposed system demonstrated superior performance in comparison with pure sliding-window-based image detectors.
Conference ANDREWS, S.; OLIVEIRA, L. SCHNITMAN, L.; SOUZA, F. (Best Paper) Highway Traffic Congestion Classification Using Holistic Properties. In: 15th International Conference on Signal Processing (ICSP), Pattern Recognition and Applications, Amsterdam, 2013. 8 p. Abstract: This work proposes a holistic method for highway traffic video classification based on vehicle crowd properties. The method classifies the traffic congestion into three classes: light, medium and heavy. This is done by usage of average crowd density and crowd speed. Firstly, the crowd density is estimated by background subtraction and the crowd speed is performed by pyramidal Kanade-Lucas-Tomasi (KLT) tracker algorithm. The features classification with neural networks show 94.50% of accuracy on experimental results from 254 highway traffic videos of UCSD data set.
Conference FRANCO, A.; LIMA, R.; OLIVEIRA, L. Person Classification in Images: An Unbiased Analysis from Multiple Poses and Situations. In: Simposio Brasileiro de Automacao Inteligente (SBAI), Fortaleza, 2013. 6 p. Abstract: Person classification is one of the most important study topics in the field of image pattern recognition. Over the past decades, novel methods have been evolved, and object features and classifiers created. Applications such as person detection and tracking, in intelligent transportation systems or video surveillance, benefit from person classification for real-life applications. Nevertheless, for that systems to be employed there is a need of assessing their performance to assure that will be effective in practice. From plots of classification performance to real-life applications, there seems to be a gap not yet solved, since a near perfect performance curve is not a guarantee of a flawless detection system. In this paper, we present a thorough study toward comprehending why person classifiers are so perfect in plots but not yet completely successful in practice. For that, several features (histogram of oriented gradients (HOG), pyramid HOG, local binary pattern, local phase quantization and Haar-like), two of the most applied classifiers (support vector machine and adaptive boosting) are analyzed over the 2012 person classification Pascal VOC dataset with 27647 cropped images, grouped into 8 person poses and situations. By relying on receiver operating characteristic and precision-recall tools, it was observed that person classification, in several poses and situations, demonstrated to have two different dominant performances, or even different variances among those two performance tools. One main conclusion drawn from the present study was that there is an inherent biased analysis, while assessing a novel proposed method performance. Important guesses are given in the direction of explaining why most of classification performance analyses is somewhat biased.
Journal GRIMALDO, J.; SCHNITMAN, L.; OLIVEIRA, L. Constraining image object search by multi-scale spectral residue analysis. In: Pattern Recognition Letters, v. 39, p. 31-18, 2013. Abstract: Using an object detector over a whole image can require significant processing time. This is so since the majority of the images, in common scenarios, is composed of non-trivial amounts of background information, such as sky, ground and water. To alleviate this computational load, image search space reduction methods can make the detection procedure focus on more distinctive image regions. In this sense, we propose here the use of saliency information to organize regions based on their probability of containing objects. The proposed method was grounded on a multi-scale spectral residue (MSR) analysis for saliency detection. For better search space reduction, our method enables fine control of search scale, presents more robustness to variations on saliency intensity along an object length, and also a straightforward way to control the balance between search space reduction and false negatives, both being a consequence of region selection. MSR was capable of making object detection three to five times faster compared to the same detector without MSR. A thorough analysis was accomplished to demonstrate the effectiveness of the proposed method using a custom LabelMe dataset of person images, and also a Pascal VOC 2007 dataset, containing several distinct object classes.
Conference DUARTE, C.; SOUZA, T.; ALVES, R.; SHWARTZ, W. R.; OLIVEIRA, L. Re-identifying People based on Indexing Structure and Manifold Appearance Modeling. In: XVI Conference on Graphics, Patterns and Images (SIBGRAPI), Arequipa, 2013. 8 p. Abstract: The role of person re-identification has increased in the recent years due to the large camera networks employed in surveillance systems. The goal in this case is to identify individuals that have been previously identified in a different camera. Even though several approaches have been proposed, there are still challenges to be addressed, such as illumination changes, pose variation, low acquisition quality, appearance modeling and the management of the large number of subjects being monitored by the surveillance system. The present work tackles the last problem by developing an indexing structure based on inverted lists and a predominance filter descriptor with the aim of ranking candidates with more probability of being the target search person. With this initial ranking, a more strong classification is done by means of a mean Riemann covariance method, which is based on a appearance strategy. Experimental results show that the proposed indexing structure returns an accurate shortlist containing the most likely candidates, and that manifold appearance model is able to set the correct candidate among the initial ranks in the identification process. The proposed method is comparable to other state-of-the-art approaches.
Conference SANTOS, M.; LINDER, M.; SCHNITMAN, L.; NUNES, U.; OLIVEIRA, L. Learning to segment roads for traffic analysis in urban images. In: IEEE Intelligent Vehicles Symposium, Gold Coast City, 2013. 6 p. Abstract: Road segmentation plays an important role in many computer vision applications, either for in-vehicle perception or traffic surveillance. In camera-equipped vehicles, road detection methods are being developed for advanced driver assistance, lane departure, and aerial incident detection, just to cite a few. In traffic surveillance, segmenting road information brings special benefits: to automatically wrap regions of traffic analysis (consequently, speeding up flow analysis in videos), to help with the detection of driving violations (to improve contextual information in videos of traffic), and so forth. Methods and techniques can be used interchangeably for both types of application. Particularly, we are interested in segmenting road regions from the remaining of an image, aiming to support traffic flow analysis tasks. In our proposed method, road segmentation relies on a superpixel detection based on a novel edge density estimation method; in each superpixel, priors are extracted from features of gray-amount, texture homogeneity, traffic motion and horizon line. A feature vector with all those priors feeds a support vector machine classifier, which ultimately takes the superpixel-wise decision of being a road or not. A dataset of challenging scenes was gathered from traffic video surveillance cameras, in our city, to demonstrate the effectiveness of the method.
2012
Conference SILVA, G.; SCHNITMAN, L.; OLIVEIRA, L. Multi-Scale Spectral Residual Analysis to Speed up Image Object Detection. In: XV Conference on Graphics, Patterns and Images (SIBGRAPI), Ouro Preto, 2012. 8 p. Abstract: Accuracy in image object detection has been usually achieved at the expense of much computational load. Therefore a trade-off between detection performance and fast execution commonly represents the ultimate goal of an object detector in real life applications. In this present work, we propose a novel method toward that goal. The proposed method was grounded on a multi-scale spectral residual (MSR) analysis for saliency detection. Compared to a regular sliding window search over the images, in our experiments, MSR was able to reduce by 75% (in average) the number of windows to be evaluated by an object detector. The proposed method was thoroughly evaluated over a subset of LabelMe dataset (person images), improving detection performance in most cases.
Conference SILVA, C.; SCHNITMAN, L.; OLIVEIRA, L. Detecção de Landmarks em Imagens Faciais Baseada em Informações Locais. In: XIX Congresso Brasileiro de Automática (CBA), Campina Grande, 2012. Abstract: This paper proposes a method for the detection of 19 facial points of interest (landmarks). Most methods available in the art for detecting facial points fall into two main categories: global and local. Global methods are usually able to detect various landmarks simultaneously with robustness while local landmarks can often detect more quickly. The method presented is based on local information and is composed of several stages of processing to the detection of landmarks that describe eyes, eyebrows and mouth. The experimental results demonstrate that the proposed method compared to the results showed technical ASM compatible.
Conference SOUZA, T.; SCHNITMAN, L.; OLIVEIRA, L. Eigen analysis and gray alignment for shadow detection applied to urban scene image. In: IEEE International Conference on Intelligent Robots and Systems (IROS) Workshop on Planning, Perception and Navigation for Intelligent Vehicles, Vilamoura, 2012. Abstract: Urban scene analysis is very useful for many intelligent transportation systems (ITS), such as advanced driver assistance, lane departure control and traffic flow analysis. All these systems are prone to any kind of noise, which ultimately harms system performance. Considering shadow as a noise problem, this may represent a critical line between the success or fail of an ITS framework. Therefore, shadow detection usually provides benefits for further stages of machine vision applications on ITS, although its practical use usually depends on the computational load of the detection system. To cope with those issues, a novel shadow detection method, applied to urban scenes, is proposed in this paper. This method is based on a measure of the energy defined by the summation of the eigenvalues of image patches. The final decision of an image region to contain a shadow is made according to a new metric for unsupervised classification called here as gray alignment. The characteristics of the proposed method include no supervision, very low computational cost and mathematical background unification, which turns the method very effective. Our proposed approach was evaluated on two public datasets.