* Click on the title to read the abstract.


Journal SANTOS, M.; OLIVEIRA, L. ISEC: Iterative over-Segmentation via Edge Clustering. In: Elsevier Image and Vision Computing, 2018.

Abstract: Several image pattern recognition tasks rely on superpixel generation as a fundamental step. Image analysis based on superpixels facilitates domain-specific applications, also speeding up the overall processing time of the task. Recent superpixel methods have been designed to fit boundary adherence, usually regulating the size and shape of each superpixel in order to mitigate the occurrence of undersegmentation failures. Superpixel regularity and compactness sometimes imposes an excessive number of segments in the image, which ultimately decreases the efficiency of the final segmentation, specially in video segmentation. We propose here a novel method to generate superpixels, called iterative over-segmentation via edge clustering (ISEC), which addresses the over-segmentation problem from a different perspective in contrast to recent state-of-the-art approaches. ISEC iteratively clusters edges extracted from the image objects, providing adaptive superpixels in size, shape and quantity, while preserving suitable adherence to the real object boundaries. All this is achieved at a very low computational cost. Experiments show that ISEC stands out from existing methods, meeting a favorable balance between segmentation stability and accurate representation of motion discontinuities, which are features specially suitable to video segmentation.

Conference JADER, G.; FONTINELE, J.; RUIZ, M.; ABDALLA, K.; PITHON, M.; OLIVEIRA, L. Deep instance segmentation of teeth in panoramic X-ray images. In: Conference on Graphics, Patterns and Images (SIBGRAPI'2018), Foz do Iguaçu, 2018.

Abstract: In dentistry, radiological examinations help specialists by showing structure of the tooth bones with the goal of screening embedded teeth, bone abnormalities, cysts, tumors, infections, fractures, problems in the temporomandibular regions, just to cite a few. Sometimes, relying solely in the specialist’s opinion can bring differences in the diagnoses, which can ultimately hinder the treatment. Although tools for complete automaticdiagnosis are no yet expected, image pattern recognition has evolved towards decision support, mainly starting with the detection of teeth and their components in X-ray images. Tooth detection has been object of research during at least the last two decades, mainly relying in threshold and region-based methods. Following a different direction, this paper proposes to explore a deep learning method for instance segmentation of the teeth. To the best of our knowledge, it is the first system that detects and segment each tooth in panoramic X-ray images. It is noteworthy that this image type is the most challenging one to isolate teeth, since it shows other parts of patient’s body (e.g., chin, spine and jaws). We propose a segmentation system based on mask regionbased convolutional neural network to accomplish an instance segmentation. Performance was thoroughly assessed from a 1500 challenging image data set, with high variation and containing 10 categories of different types of buccal image. By training the proposed system with only 193 images of mouth containing 32 teeth in average, using transfer learning strategies, we achieved 98% of accuracy, 88% of F1-score, 94% of precision, 84% of recall and 99% of specificity over 1224 unseen images, results very superior than other 10 unsupervised methods.

Keywords: CNN; training.

Journal SOUZA, L.; OLIVEIRA, L.; PAMPLONA, M.; PAPA, J. How far did we get in face spoofing detection?. In: Elsevier Engineering Applications of Artificial Intelligence, 2018.

Abstract: The growing use of control access systems based on face recognition shed light over the need for even more accurate systems to detect face spoofing attacks. In this paper, an extensive analysis on face spoofing detection works published in the last decade is presented. The analyzed works are categorized by their fundamental parts, i.e., descriptors and classifiers. This structured survey also brings a comparative performance analysis of the works considering the most important public data sets in the field. The methodology followed in this work is particularly relevant to observe temporal evolution of the field, trends in the existing approaches, to discuss still opened issues, and to propose new perspectives for the future of face spoofing detection.

Journal SILVA, G.; OLIVEIRA, L.; PITHON, M. Automatic segmenting teeth in X-ray images: Trends, a novel data set, benchmarking and future perspectives. In: Elsevier Expert Systems with Applitions, 2018.

Abstract: This review presents an in-depth study of the literature on segmentation methods applied in dental imaging. Several works on dental image segmentation were studied and categorized according to the type of method (region-based, threshold-based, cluster-based, boundary-based or watershed-based), type of X-ray images analyzed (intra-oral or extra-oral), and characteristics of the data set used to evaluate the methods in each state-of-the-art work. We found that the literature has primarily focused on threshold-based segmentation methods (54%). 80% of the reviewed articles have used intra-oral X-ray images in their experiments, demonstrating preference to perform segmentation on images of already isolated parts of the teeth, rather than using extra-oral X-rays, which also show tooth structure of the mouth and bones of the face. To fill a scientific gap in the field, a novel data set based on extra-oral X-ray images, presenting high variability and with a large number of images, is introduced here. A statistical comparison of the results of 10 pixel-wise image segmentation methods over our proposed data set comprised of 1,500 images is also carried out, providing a comprehensive source of performance assessment. Discussion on limitations of the benchmarked methods, as well as future perspectives on exploiting learning-based segmentation methods to improve performance, is also addressed. Finally, we present a preliminary application of the MASK recurrent convolutional neural network to demonstrate the power of a deep learning method to segment images from our data set.


Conference DAHIA, G. ; SANTOS, M. M. B. ; PAMPLONA SEGUNDO, M. A study of CNN outside of training conditions. In: IEEE International Conference on Image Processing (ICIP2017), Beijing, 2017.

Abstract: Convolution neural networks (CNN) are the main development in face recognition in recent years. However, their description capacities have been somewhat understudied. In this paper, we show that training CNN only with color images is enough to properly describe depth and near infrared face images by assessing the performance of three publicly available CNN models on these other modalities. Furthermore, we find that, despite displaying results comparable to the human performance on LFW, not all CNN behave like humans recognizing faces in other scenarios.

Keywords: CNN; training.

Journal CERQUEIRA, R.; TROCOLI, T.; NEVES, G.; JOYEUX, S.; ALBIEZ, J.; OLIVEIRA, L. A novel GPU-based sonar simulator for real-time applications. In: Elsevier Computers and Graphics, 2017.

Abstract: Mainly when applied in the underwater environment, sonar simulation requires great computational effort due to the complexity of acoustic physics. Simulation of sonar operation allows evaluating algorithms and control systems without going to the real underwater environment; that reduces the costs and risks of in-field experiments. This paper tackles with the problem of real-time underwater imaging sonar simulation by using the OpenGL shading language chain on GPU. Our proposed system is able to simulate two main types of acoustic devices: mechanical scanning imaging sonars and forward-looking sonars. The underwater scenario simulation is performed based on three frameworks: (i) OpenSceneGraph reproduces the ocean visual effects, (ii) Gazebo deals with physical forces, and (iii) the Robot Construction Kit controls the sonar in underwater environments. Our system exploits the rasterization pipeline in order to simulate the sonar devices, which are simulated by means of three parameters: the pulse distance, the echo intensity and the sonar field-of-view, being all calculated over observable objects shapes in the 3D rendered scene. Sonar-intrinsic operational parameters, speckle noise and object material properties are also considered as part of the acoustic image. Our evaluation demonstrated that the proposed system is able to operate close to or faster than the real-world devices. Also, our method generates visually realistic sonar images when compared with real-world sonar images of the same scenes.

Keywords: Sonar simulation; Robot Construction Kit (ROCK); AUV.

Journal ARAÚJO, POMPÍLIO; MIRANDA, RODOLFO; CARMO, DIEDRE; ALVES, RAUL; OLIVEIRA, L. Air-SSLAM: A visual stereo indoor SLAM for aerial quadrotors. In: IEEE Geoscience and Remote Sensing Letters, 2017.

Abstract: In this letter, we introduce a novel method for visual simultaneous localization and mapping (SLAM) – so-called Air-SSLAM –, which exploits a stereo camera configuration. In contrast to monocular SLAM, scale definition and 3D information are issues that can be more easily dealt with in stereo cameras. Air-SSLAM starts from computing keypoints and the correspondent descriptors over the pair of images, using good features-to-track and rotated-binary robust independent elementary features, respectively. Then a map is created by matching each pair of right and left frames. The long-term map maintenance is continuously performed by analyzing the quality of each matching, as well as by inserting new keypoints into uncharted areas of the environment. Three main contributions can be highlighted in our method: (i) a novel method to match keypoints efficiently, (ii) three quality indicators with the aim of speeding up the mapping process, and (iii) map maintenance with uniform distribution performed by image zones. By using a drone equipped with a stereo camera, flying indoor, the translational average error with respect to a marked ground truth was computed, demonstrating promising results.

Keywords: Stereo SLAM; drone; indoor.

Conference CARMO, DIEDRE; ALVES, RAUL; OLIVEIRA, L. Face identification based on synergism of classifiers in rectified stereo images. In: Workshop of Undergraduate Works (SIBGRAPI'2017), Niteroi, 2017.

Abstract: This paper proposes a method to identify faces from a stereo camera. Our approach tries to avoid common problems that come with using only one camera that shall arise while detecting from a relatively unstable view in real world applications. The proposed approach exploits the use of a local binary pattern (LBP) to describe the faces in each image of the stereo camera, after detecting the face using the Viola-Jones’method. LBP histogram feeds then multilayer perceptron (MLP) and support vector machines (SVM) classifiers to identify the faces detected in each stereo image, considering a database of target faces. Computational cost problems due to the use of dual cameras are alleviated with the use of co-planar rectified images, achieved through calibration of the stereo camera. Performance is assessed using the well established Yale face dataset, and performance is assessed by using only one or both camera images.

Keywords: stereo; face identification; image rectification.


Conference CERQUEIRA, R.; TROCOLI, T.; NEVES, G.; OLIVEIRA, L.; JOYEUX, S.; ALBIEZ, J. Custom Shader and 3D Rendering for computationally efficient Sonar Simulation. In: XIX Conference on Graphics, Patterns and Images (SIBGRAPI), Sao Jose dos Campos, 2016. 4 p.

Abstract: This paper introduces a novel method for simulating underwater sonar sensors by vertex and fragment processing. The virtual scenario used is composed of the integration between the Gazebo simulator and the Robot Construction Kit (ROCK) framework. A 3-channel matrix with depth and intensity buffers and angular distortion values is extracted from OpenSceneGraph 3D scene frames by shader rendering, and subsequently fused and processed to generate the synthetic sonar data. To export and display simulation resources, this approach was written in C++ as ROCK packages. The method is evaluated on two use cases: the virtual acoustic images from a mechanical scanning sonar and forward-looking sonar simulations.

Keywords: Synthetic Sensor Data; Sonar Imaging; Robot Construction Kit (ROCK); Underwater Robotics.

Journal FRANCO, A.; OLIVEIRA, L. Convolutional covariance features: Conception, integration and performance in person re-identification. In: Pattern Recognition, 2016.

Abstract: This paper introduces a novel type of features based on covariance descriptors – the convolutional covariance features (CCF). Differently from the traditional and handcrafted way to obtain covariance descriptors, CCF is computed from adaptive and trainable features, which come from a coarse-to-fine transfer learning (CFL) strategy. CFL provides a generic-to-specific knowledge and noise-invariant information for person re-identification. After training the deep features, convolutional and flat features are extracted from, respectively, intermediate and top layers of a hybrid deep network. Intermediate layer features are then wrapped in covariance matrices, composing the so-called CCF, which are integrated to the top layer features, called here flat features. Integration of CCF and flat features demonstrated to improve the proposed person re-identification in comparison with the use of the component features alone. Our person re-identification method achieved the best top 1 performance, when compared with other 18 state-of-the-art methods over VIPeR, i-LIDS, CUHK01 and CUHK03 data sets. The compared methods are based on deep learning, covariance descriptors, or handcrafted features and similarity functions.

Keywords: Person re-identification; Covariance features; Deep leaning; Transfer learning.

Conference FRANCO, A.; OLIVEIRA, L. A coarse-to-fine deep learning for person re-identification. In: IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, New York, 2017.

Abstract: This paper proposes a novel deep learning architecture for person re-identification. The proposed network is based on a coarse-to-fine learning (CFL) approach, attempting to acquire a generic-to-specific knowledge throughout a transfer learning process. The core of the method relies on a hybrid network composed of a convolutional neural network and a deep belief network denoising autoenconder. This hybrid network is in charge of extracting features invariant to illumination varying, certain image deformations, horizontal mirroring and image blurring, and is embedded in the CFL architecture. The proposed network achieved the best results when compared with other state-of-the-arts methods on i-LIDS, CUHK01 and CUHK03 data sets, and also a competitive performance on VIPeR data set.

Keywords: Deep network; person identification; CNN.

Conference TROCOLI, T.; OLIVEIRA, L. Using the scene to calibrate the camera. In: XIX Conference on Graphics, Patterns and Images (SIBGRAPI), Sao Jose dos Campos, 2016. 7 p.

Abstract: Surveillance cameras are used in public and private security systems. Typical systems may contain a large number of different cameras, which are installed in different locations. Manual calibration of each single camera in the network becomes an exhausting task. Although we can find methods that semiautomatically calibrate a static camera, to the best of our knowledge, there is not a fully automatic calibration procedure, so far. To fill this gap, we propose here a novel framework for completely auto-calibration of static surveillance cameras, based on information of the scene (environment and walkers). Characteristics of the method include robustness to walkers’ pose and to camera location (pitch, roll, yaw and height), and rapid camera parameter convergence. For a thorough evaluation of the proposed method, the walkers’ foot-head projection, the length of the lines projected on the ground plane and the walkers’ real heights were analyzed over public and private data sets, demonstrating the potential of the proposed method.

Keywords: camera calibration; surveillance camera; auto calibration.


Conference CARMO, D.; JOVITA, R.; FERRARI, R.; OLIVEIRA, L. A study on multi-view calibration methods for RGB-D cameras. In: Workshop of Undergraduate Works (SIBGRAPI'2015), Salvador, 2015. 6 p.

Abstract: RGB-D cameras became part of our daily life in applications such as human-computer interface and game interaction, just to cite a few. Because of their easy programming interface and response precision, such cameras have also been increasingly used to 3D reconstruction and movement analysis. In view of that, calibration of multiple cameras is an essential task. On that account, the goal of this paper is to present a preliminary study of methods which tackle the problem of multi-view geometry computation using RGB-D cameras. A brief overview of camera geometry is presented, some methods of calibration are discussed and one of them is evaluated in practice; finally, some important points are addressed about practical issues involving the problem.

Keywords: RGB-D cameras; multi-view calibration; cross-talk interference.

Conference NOBRE, T.; OLIVEIRA, L. Finger phalanx detection and tracking by contour analysis on RGB-D images. In: Workshop of Works in Progress (SIBGRAPI'2015), Salvador, 2015. 4 p.

Abstract: In this paper we propose a method for identification of the finger phalanges based on the analysis of hand contour in RGB-D sensors. The proposed method is able to partially identify and track the kinematic structure of the fingers. The tracking was performed using the ORB algorithm to match points between a template with some hand images (in different poses) and the image captured. The principal component analysis was performed to compute the hand orientation relative to the image plane. The system will be used as a starting point for a full tracking of the fingers articulated movement.

Keywords: Phalanges; hand kinematic; RGB-D cameras;

Conference CANÁRIO, J. P.; OLIVEIRA, L. Recognition of Facial Expressions Based on Deep Conspicuous Net. In: Iberoamerican Congress on Pattern Recognition, Salvador, 2015. 8 p.

Abstract: Facial expression has an important role in human interaction and non-verbal communication. Hence more and more applications, which automatically detect facial expressions, start to be pervasive in various fields, such as education, entertainment, psychology, humancomputer interaction, behavior monitoring, just to cite a few. In this paper, we present a new approach for facial expression recognition using a so-called deep conspicuous neural network. The proposed method builds a conspicuousmap of region faces, training it via a deep network. Experimental results achieved an average accuracy of 90% over the extended Cohn-Kanade data set for seven basic expressions, demonstrating the best performance against four state-of-the-art methods.

Keywords: Conspicuity; Facial expression; Deep learning.

Conference PAMPLONA SEGUNDO, M.; LEMES, P. R. Pore-based ridge reconstruction for fingerprint recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, 2015. 6 p.

Abstract: The use of sweat pores in fingerprint recognition is becoming increasingly popular, mostly because of the wide availability of pores, which provides complementary information for matching distorted or incomplete images. In this work we present a fully automatic pore-based fingerprint recognition framework that combines both pores and ridges to measure the similarity of two images. To obtain the ridge structure, we propose a novel pore-based ridge reconstruction approach by considering a connect-the-dots strategy. To this end, Kruskal's minimum spanning tree algorithm is employed to connect consecutive pores and form a graph representing the ridge skeleton. We evaluate our framework on the PolyU HRF database, and the obtained results are favorably compared to previous results in the literature.

Conference SANTOS, M.; OLIVEIRA, L. Context-supported Road Information for Background Modeling. In: XVIII Conference on Graphics, Patterns and Images (SIBGRAPI), Salvador, 2015. 8 p.

Abstract: Background subtraction methods commonly suffers from incompleteness and instability over many situations. If one treats fast updating when objects run fast, it is not reliable to modeling the background while objects stop in the scene, as well; it is easy to find examples where the contrary is also true. In this paper we propose a novel method–designated Context-supported ROad iNformation (CRON) for unsupervised back-ground modeling, which deals with stationary foreground objects, while presenting a fast background updating. Differently from general-purpose methods, our method was specially conceived for traffic analysis, being stable in several challenging circumstances in urban scenarios. To assess the performance of the method, a thorough analysis was accomplished, comparing the proposed method with many others, demonstrating promising results in our favor.

Keywords: Background modeling; traffic analysis; surveillance videos.

Journal VIEIRA, J. P.; CARMO, D.; JOVITA, Y.; OLIVEIRA, L. A proposal of a non-intrusive, global movement analysis of hemiparesis treatment. In: Journal of Communication and Information Systems (Online), v. 30, n. 1, 2015. 11 p.

Abstract: Hemiparesis is the most disabling condition after a stroke. Hemiparetic individuals suffer from a loss of muscle strength on one side of the body, resulting in a decreased capacity of performing movements. To assess the quality of Physiotherapy treatment, rating scales are commonly used but with the shortcoming of being subjective. With the aim of developing a system that objectively outcomes how a hemiparetic individual is responding to a Physiotherapy treatment, this paper proposes a method to analyze human functional movement by means of an apparatus comprised of multiple low-cost RGB-D cameras. After extrinsically calibrating the cameras, the setup system should be able to build a composite skeleton of the target patient, to globally analyze patient’s movement according to a reachable workspace and specific energy. These latter both a reproposed to be carried out by tracking the hand movements of the patient, and the movement volume produced. Here we present the concept of the proposed system, as well as, the idea of its parts.

Index Terms: Movement volume; Hemiparesis; RGB-D cameras; kinect; specific energy; reachable workspace.


Conference LEMES, R. P.; PAMPLONA SEGUNDO, M.; BELLON, O. R. P.; SILVA, L. Dynamic Pore Filtering for Keypoint Detection Applied to Newborn Authentication. In: 22nd International Conference on Pattern Recognition (ICPR), Stockholm, 2014. 6 p.

Abstract: We present a novel method for newborn authentication that matches keypoints in different interdigital regions from palmprints or footprints. Then, the method hierarchically combines the scores for authentication. We also present a novel pore detector for keypoint extraction, named Dynamic Pore Filtering (DPF), that does not rely on expensive processing techniques and adapts itself to different sizes and shapes of pores. We evaluated our pore detector using four different datasets. The obtained results of the DPF when using newborn dermatoglyphic patterns (2400ppi) are comparable to the state-of-the-art results for adult fingerprint images with 1200ppi. For authentication, we used four datasets acquired by two different sensors, achieving true acceptance rates of 91.53% and 93.72% for palmprints and footprints, respectively, with a false acceptance rate of 0%. We also compared our results to our previous approach on newborn identification, and we considerably outperformed its results, increasing the true acceptance rate from 71% to 98%.

Keywords: Newborn recognition; dermatoglyphic patterns; pore detection.

Conference VIEIRA, J. P.; CARMO, D.; FERREIRA, R.; MIRANDA, J. G.; OLIVEIRA, L. Analysis of Human Activity By Specific Energy of Movement Volume in Hemiparetic Individuals. In: XVII Conference on Graphics, Patterns and Images (SIBGRAPI), Workshop on Vision-based Human Activity Recognition, Rio de Janeiro, 2014. 7 p.

Abstract: Hemiparesis is the most disabling condition after a stroke. Hemiparetic individuals suffer from a loss of muscle strength on one side of the body, resulting in a decreased capacity of performing movements. To assess the quality of Physiotherapy treatment, rating scales are commonly used but with the drawback of being subjective.With the aim of developing a system that objectively outcomes how a hemiparetic individual is responding to a Physiotherapy treatment, this paper proposes a method to analyze human functional movement by means of an apparatus comprised of multiple low-cost RGB-D cameras. The idea is to first reconstruct the human body from multiple point of views, stitching them all, and, by isolating the movement of interest, track a movement volume and its specific energy in order to compare a “before” and “after” same activity. With that, we intend to avoid common problems referred to errors in the calculation of joints and angles. Here we present the concept of our system, as well as the idea of its parts.

Keywords: movement volume; Hemiparesis; RGB-D cameras; specific energy.

Journal OLIVEIRA, L.; COSTA, V.; NEVES, G.; OLIVEIRA, T.; JORGE, E.; LIZARRAGA, M. A mobile, lightweight, poll-based food identification system. In: Pattern Recognition, v. 47, i. 5, p. 1941-1952, 2014.

Abstract: Even though there are many reasons that can lead to people being overweight, experts agree that ingesting more calories than needed is one of them. But besides the appearance issue ,being overweight is actually a medical concern because it can seriously affect a person's health. Losing weight then becomes an important goal, and one way to achieve it, is to burn more calories than ingested. The present paper addresses the problem of food identification based on image recognition as a tool for dietary assessment. To the best of our knowledge, this is the first system totally embedded into a camera - equipped mobile device, capable of identifying and classifying meals – that is, pictures which have multiple types of food placed on a plate. Considering the variability of the environment conditions, which the camera will be in, the identification process must be robust. It must also be fast, sustaining very low wait-times for the user. In this sense, we propose a novel approach, which integrates segmentation and learning on a multi-ranking framework. The segmentation is based on a modified region-growing method which runs over multiple feature spaces. These multiple segments feed support vector machines, which rank the most probable segment corresponding to a type of food. Experimental results demonstrate the effectiveness of the proposed method on a cellphone.

Keywords: Food identification; Multi-hypothesis segmentation; Multi-ranking classification; Mobile device.


Journal GRIMALDO, J.; SCHNITMAN, L.; OLIVEIRA, L. Constraining image object search by multi-scale spectral residue analysis. In: Pattern Recognition Letters, v. 39, p. 31-18, 2013.

Abstract: Using an object detector over a whole image can require significant processing time. This is so since the majority of the images, in common scenarios, is composed of non-trivial amounts of background information, such as sky, ground and water. To alleviate this computational load, image search space reduction methods can make the detection procedure focus on more distinctive image regions. In this sense, we propose here the use of saliency information to organize regions based on their probability of containing objects. The proposed method was grounded on a multi-scale spectral residue (MSR) analysis for saliency detection. For better search space reduction, our method enables fine control of search scale, presents more robustness to variations on saliency intensity along an object length, and also a straightforward way to control the balance between search space reduction and false negatives, both being a consequence of region selection. MSR was capable of making object detection three to five times faster compared to the same detector without MSR. A thorough analysis was accomplished to demonstrate the effectiveness of the proposed method using a custom LabelMe dataset of person images, and also a Pascal VOC 2007 dataset, containing several distinct object classes.

Keywords: Fast object detection; Saliency; Multi-scale spectral residue.

Conference FRANCO, A.; LIMA, R.; OLIVEIRA, L. Person Classification in Images: An Unbiased Analysis from Multiple Poses and Situations. In: Simposio Brasileiro de Automacao Inteligente (SBAI), Fortaleza, 2013. 6 p.

Abstract: Person classification is one of the most important study topics in the field of image pattern recognition. Over the past decades, novel methods have been evolved, and object features and classifiers created. Applications such as person detection and tracking, in intelligent transportation systems or video surveillance, benefit from person classification for real-life applications. Nevertheless, for that systems to be employed there is a need of assessing their performance to assure that will be effective in practice. From plots of classification performance to real-life applications, there seems to be a gap not yet solved, since a near perfect performance curve is not a guarantee of a flawless detection system. In this paper, we present a thorough study toward comprehending why person classifiers are so perfect in plots but not yet completely successful in practice. For that, several features (histogram of oriented gradients (HOG), pyramid HOG, local binary pattern, local phase quantization and Haar-like), two of the most applied classifiers (support vector machine and adaptive boosting) are analyzed over the 2012 person classification Pascal VOC dataset with 27647 cropped images, grouped into 8 person poses and situations. By relying on receiver operating characteristic and precision-recall tools, it was observed that person classification, in several poses and situations, demonstrated to have two different dominant performances, or even different variances among those two performance tools. One main conclusion drawn from the present study was that there is an inherent biased analysis, while assessing a novel proposed method performance. Important guesses are given in the direction of explaining why most of classification performance analyses is somewhat biased.

Keywords: computer vision and pattern recognition, ROC curve, precision-recall curve, person classification performance.

Conference DUARTE, C.; SOUZA, T.; ALVES, R.; SHWARTZ, W. R.; OLIVEIRA, L. Re-identifying People based on Indexing Structure and Manifold Appearance Modeling. In: XVI Conference on Graphics, Patterns and Images (SIBGRAPI), Arequipa, 2013. 8 p.

Abstract: The role of person re-identification has increased in the recent years due to the large camera networks employed in surveillance systems. The goal in this case is to identify individuals that have been previously identified in a different camera. Even though several approaches have been proposed, there are still challenges to be addressed, such as illumination changes, pose variation, low acquisition quality, appearance modeling and the management of the large number of subjects being monitored by the surveillance system. The present work tackles the last problem by developing an indexing structure based on inverted lists and a predominance filter descriptor with the aim of ranking candidates with more probability of being the target search person. With this initial ranking, a more strong classification is done by means of a mean Riemann covariance method, which is based on a appearance strategy. Experimental results show that the proposed indexing structure returns an accurate shortlist containing the most likely candidates, and that manifold appearance model is able to set the correct candidate among the initial ranks in the identification process. The proposed method is comparable to other state-of-the-art approaches.

Keywords: Person re-identification; bag-of-features; predominance filter; inverted lists; mean Riemann covariance.

Conference SANTOS, M.; LINDER, M.; SCHNITMAN, L.; NUNES, U.; OLIVEIRA, L. Learning to segment roads for traffic analysis in urban images. In: IEEE Intelligent Vehicles Symposium, Gold Coast City, 2013. 6 p.

Abstract: Road segmentation plays an important role in many computer vision applications, either for in-vehicle perception or traffic surveillance. In camera-equipped vehicles, road detection methods are being developed for advanced driver assistance, lane departure, and aerial incident detection, just to cite a few. In traffic surveillance, segmenting road information brings special benefits: to automatically wrap regions of traffic analysis (consequently, speeding up flow analysis in videos), to help with the detection of driving violations (to improve contextual information in videos of traffic), and so forth. Methods and techniques can be used interchangeably for both types of application. Particularly, we are interested in segmenting road regions from the remaining of an image, aiming to support traffic flow analysis tasks. In our proposed method, road segmentation relies on a superpixel detection based on a novel edge density estimation method; in each superpixel, priors are extracted from features of gray-amount, texture homogeneity, traffic motion and horizon line. A feature vector with all those priors feeds a support vector machine classifier, which ultimately takes the superpixel-wise decision of being a road or not. A dataset of challenging scenes was gathered from traffic video surveillance cameras, in our city, to demonstrate the effectiveness of the method.

Conference OLIVEIRA, L.; NUNES, U. Pedestrian detection based on LIDAR-driven sliding window and relational parts-based detection. In: IEEE Intelligent Vehicles Symposium, Gold Coast City, 2013. 6 p.

Abstract: The most standard image object detectors are usually comprised of one or multiple feature extractors or classifiers within a sliding window framework. Nevertheless, this type of approach has demonstrated a very limited performance under datasets of cluttered scenes and real life situations. To tackle these issues, LIDAR space is exploited here in order to detect 2D objects in 3D space, avoiding all the inherent problems of regular sliding window techniques. Additionally, we propose a relational parts-based pedestrian detection in a probabilistic non-iid framework.With the proposed framework, we have achieved state-of-the-art performance in a pedestrian dataset gathered in a challenging urban scenario. The proposed system demonstrated superior performance in comparison with pure sliding-window-based image detectors.

Conference ANDREWS, S.; OLIVEIRA, L. SCHNITMAN, L.; SOUZA, F. (Best Paper) Highway Traffic Congestion Classification Using Holistic Properties. In: 15th International Conference on Signal Processing (ICSP), Pattern Recognition and Applications, Amsterdam, 2013. 8 p.

Abstract: This work proposes a holistic method for highway traffic video classification based on vehicle crowd properties. The method classifies the traffic congestion into three classes: light, medium and heavy. This is done by usage of average crowd density and crowd speed. Firstly, the crowd density is estimated by background subtraction and the crowd speed is performed by pyramidal Kanade-Lucas-Tomasi (KLT) tracker algorithm. The features classification with neural networks show 94.50% of accuracy on experimental results from 254 highway traffic videos of UCSD data set.

Keywords: Pattern Recognition, Object Recognition and Motion, Neural Network Applications.


Conference SOUZA, T.; SCHNITMAN, L.; OLIVEIRA, L. Eigen analysis and gray alignment for shadow detection applied to urban scene image. In: IEEE International Conference on Intelligent Robots and Systems (IROS) Workshop on Planning, Perception and Navigation for Intelligent Vehicles, Vilamoura, 2012.

Abstract: Urban scene analysis is very useful for many intelligent transportation systems (ITS), such as advanced driver assistance, lane departure control and traffic flow analysis. All these systems are prone to any kind of noise, which ultimately harms system performance. Considering shadow as a noise problem, this may represent a critical line between the success or fail of an ITS framework. Therefore, shadow detection usually provides benefits for further stages of machine vision applications on ITS, although its practical use usually depends on the computational load of the detection system. To cope with those issues, a novel shadow detection method, applied to urban scenes, is proposed in this paper. This method is based on a measure of the energy defined by the summation of the eigenvalues of image patches. The final decision of an image region to contain a shadow is made according to a new metric for unsupervised classification called here as gray alignment. The characteristics of the proposed method include no supervision, very low computational cost and mathematical background unification, which turns the method very effective. Our proposed approach was evaluated on two public datasets.

Conference SILVA, G.; SCHNITMAN, L.; OLIVEIRA, L. Multi-Scale Spectral Residual Analysis to Speed up Image Object Detection. In: XV Conference on Graphics, Patterns and Images (SIBGRAPI), Ouro Preto, 2012. 8 p.

Abstract: Accuracy in image object detection has been usually achieved at the expense of much computational load. Therefore a trade-off between detection performance and fast execution commonly represents the ultimate goal of an object detector in real life applications. In this present work, we propose a novel method toward that goal. The proposed method was grounded on a multi-scale spectral residual (MSR) analysis for saliency detection. Compared to a regular sliding window search over the images, in our experiments, MSR was able to reduce by 75% (in average) the number of windows to be evaluated by an object detector. The proposed method was thoroughly evaluated over a subset of LabelMe dataset (person images), improving detection performance in most cases.

Keywords: multi-scale spectral residue, saliency, person detection.

Conference SILVA, C.; SCHNITMAN, L.; OLIVEIRA, L. Detecção de Landmarks em Imagens Faciais Baseada em Informações Locais. In: XIX Congresso Brasileiro de Automática (CBA), Campina Grande, 2012.

Abstract: This paper proposes a method for the detection of 19 facial points of interest (landmarks). Most methods available in the art for detecting facial points fall into two main categories: global and local. Global methods are usually able to detect various landmarks simultaneously with robustness while local landmarks can often detect more quickly. The method presented is based on local information and is composed of several stages of processing to the detection of landmarks that describe eyes, eyebrows and mouth. The experimental results demonstrate that the proposed method compared to the results showed technical ASM compatible.

Keywords:: Detection of facial landmarks, Face detection, Detection of facial features.