Abstract: According to exemplary methods of training a convolutional neural network, input images are received into a computerized device having an image processor. The image processor evaluates the input images using first convolutional layers. The number of first convolutional layers is based on a first size for the input images. Each layer of the first convolutional layers receives layer input signals comprising features of the input images and generates layer output signals that include signals from the input images and ones of the layer output signals from previous layers within the first convolutional layers. Responsive to an input image being a second size larger than the first size, additional convolutional layers are added to the convolutional neural network. The number of additional convolutional layers is based on the second size in relation to the first size. The additional convolutional layers are initialized using weights from the first convolutional layers. Feature maps comprising the layer output signals are created.
Abstract: A method and system for domain adaptation based on multi-layer fusion in a convolutional neural network architecture for feature extraction and a two-step training and fine-tuning scheme. The architecture concatenates features extracted at different depths of the network to form a fully connected layer before the classification step. First, the network is trained with a large set of images from a source domain as a feature extractor. Second, for each new domain (including the source domain), the classification step is fine-tuned with images collected from the corresponding site. The features from different depths are concatenated with and fine-tuned with weights adjusted for a specific task. The architecture is used for classifying high occupancy vehicle images.
Abstract: High Occupancy Vehicle (HOV) and High Occupancy Tolling (HOT) lanes have been commonly practiced in several jurisdictions to reduce traffic congestion and promote car pooling. Camera-based methods have been recently proposed for a cost-efficient, safe and effective HOV/HOT lane enforcement with the prevalence of video cameras in transportation imaging applications. An important step in automated lane enforcement systems is classification of localized window/windshield images to distinguish passenger from no-passenger vehicles to identify violators. This can be performed using deep convolutional neural networks (CNNs), which are shown to significantly outperform hand-crafted features in several classification tasks. Training/fine-tuning CNNs require a set of passenger/no-passenger images manually labeled by an operator, which requires substantial time and effort that can result in excessive operational cost and overhead. In this paper, we study adaptability of three popular CNNs (i.e., AlexNet, VGG-M, and GoogLeNet) across different domains in classifying passenger/no-passenger images as part of an automated HOV/HOT lane enforcement system. Our experiments over 40, 000 side-view vehicle images show many interesting insights for domain adaptability of these deep learning architectures.
High Occupancy Vehicle (HOV) lanes encourage carpooling and have been a common method used by transportation agencies to reduce congestion on highways. Image-based enforcement for HOV lanes is an emerging technology that uses one or more cameras mounted on overhead gantries and/or roadside poles to capture imagery inside vehicles and make computer vision based assessments of the occupancy state of the vehicle. One proposed system uses two cameras to capture images of the front seat and rear seat of vehicles traveling in HOV lanes and identifies violators by processing the captured images. In this paper, we compare combining information from the two cameras using either an early fusion approach or a late fusion approach to determine whether the vehicle is a car pool lane violator or not. The performance is compared on a set of images acquired'in-the-wild'from public roadway testing sites.