Spring 2008

CLUSTERING CATEGORICAL DATA USING DATA SUMMARIES

Location: 

Room 4421

Speaker: 

EMAN ABDU

Abstract: 

Cluster analysis is an active area of research with applications in various fields including information retrieval, social sciences, bioinformatics, object recognition, and image segmentation (Jain et al. 1999). However, most algorithms are intended for numerical (continuous) data where proximity among data objects is naturally defined by virtue of their numerical properties. Although these algorithms can be used on categorical data, they are not designed to handle data properties typically found in this data type such as high dimensionality and lack of inherent relationships among attribute values. During the past decade, several algorithms have been designed for categorical data such as K-modes (Huang 1998), STIRR (Gibson et al, 1998), CACTUS (Ganti and Ramakrishnan, 1999), ROCK (Guha et al, 2000), COOLCAT (Barbara et al, 2002), LIMBO (Andritsos et al, 2004), CLICKS (Zaki et al, 2005, 2007), and others. These algorithms exploit inter or intra relationships among attributes through data summaries such as frequency, data values co-occurrences, information entropy and links among data items. In this proposal, we focus on using data summaries and spectral analysis to detect clustering structure in categorical data. Spectral techniques provide a relaxed solution to the discrete clustering problem which has been shown to be NP-hard (Drineas 2004). Formulating the clustering problem as a graph partitioning problem and then finding the minimum normalized cut leads to a solution based on eigenvectors of the similarity matrix (i.e. Laplacian matrix). Spectral methods have been used in various algorithms and have been shown to find non-linearly separable clusters. Equally important, spectral analysis encompasses techniques for handling high-dimensional data since input data is projected into a lower-dimensional space where all computation/comparisons can be performed. Our approach is to extend spectral techniques to data summaries which are relatively less expensive to compute than object similarity matrix for very large datasets. Our goal is to combine the benefits of spectral analysis with the relative low cost of computing data summaries. We have developed three algorithms that use spectral techniques and data summaries to cluster categorical data. Our test results on standard datasets show that our algorithms are competitive with current spectral and non-spectral algorithms for categorical data.

Committee: 

PROFESSOR BILAL KHAN, MENTOR, JOHN JAY OF CRIMINAL JUSTICE
PROFESSOR PING JI, JOHN JAY OF CRIMINAL JUSTICE
PROFESSOR SPIRIDON BAKIRAS, JOHN JAY OF CRIMINAL JUSTICE

COLUMN BY COLUMN IMAGE-SIGNAL ENHANCEMENT

Location: 

Room 3207

Speaker: 

EILAT VARDI-GONEN

Abstract: 

The general problem that we consider is the real-time enhancement of a noisy image as it arrives column by column (time is associated with the horizontal axis). The enhancement is modeled by a column of real values between 0 and 1 (with as many pixels as in the noisy image column), which is used as a mask on the noisy image column. That is, the clean image column is estimated by multiplying the noisy image column by the mask column, pixel by pixel. In this work, the real-valued column is attained by "fuzzifying" a binary column, which is estimated from the noisy image column using a prior model and a noise model. The prior model is specified by a statistical distribution that assigns to binary columns a probability of occurrence. The noise model is specified by a statistical distribution that assigns the likelihood of occurrence of the noisy image column given the binary column. These statistics should either be known or estimated from a training set of typical images from the application area. Given models of this type, our task is to design an algorithm that will produce the sought-after binary and "fuzzified" columns in "real-time," that is as the noisy image arrives column by column. In many applications signals are transformed into images by taking the squared norm of the Short-Time Fourier Transform of the signal, called the spectrogram. These spectrograms are extremely redundant; columns represent spectral information of overlapping time intervals. We, therefore, tested the possible applicability of image processing algorithms to the problem of increasing intelligibility in the hearing aid application. It was our hypothesis that for every spectrogram image of noisy speech there exist a binary image and its "fuzzified" image that (along with the noisy spectrogram image) contain the information essential for estimating the clean speech signal. The proposed processing was tested on 24 normal hearing subjects using the Modified Rhyme Test. Unfortunately, the specific choices that we made in the development of our processing methodology decreased the intelligibility of noisy speech signals.

Committee: 

DISTINGUISHED PROFESSOR GABOR T. HERMAN, MENTOR, THE GRADUATE CENTER
DISTINGUISHED PROFESSOR ROBERT M. HARALICK, THE GRADUATE CENTER
PROFESSOR ARLENE NEUMAN, THE GRADUATE CENTER

OUTSIDE MEMBER:

DR. MICHAEL T. CHAN
MIT LINCOLN LABORATORY

3D SHAPE MATCHING USING GEOMETRIC FEATURES: A SURVEY

Location: 

Room 4421

Speaker: 

HADI FADAIFARD

Abstract: 

3D shape matching refers to the process of determining the amount of similarity between two 3D shapes. A large class of matching methods use feature-based descriptors to determine shape similarity. The use of these descriptors, or feature vectors, enables the 3D matching methods to be applied to a large class of 3D problems such as registration, shape retrieval, shape recognition and classification. In this paper we provide a survey of feature-based approaches to 3D matching, particularly as they apply to global and partial shape matching. Partial shape matching is a more difficult problem than global matching, since the location, size, orientation, or overlap of the 3D models that need to be matched are not known a priori. To alleviate this problem, the majority of partial 3D matching methods use some criteria to pick salient or distinctive points on the models, referred to as feature points. We investigate the different criteria used in selecting these feature points.

Committee: 

PROFESSOR GEORGE WOLBERG, MENTOR, THE CITY COLLEGE
PROFESSOR MICHAEL GROSSBERG, THE CITY COLLEGE
PROFESSOR IOANNIS STAMOS, HUNTER COLLEGE

3D INTRA-GROUP LOCALIZATION AND MAPPING WITH MULTIPLE HETEROGENEOUS ROBOTS

Location: 

Room 4421

Speaker: 

RAVI KAUSHIK

Abstract: 

3D Mapping with multiple heterogeneous robots that work as a team is a concerted effort to learn its surrounding environment in a quick and efficient way. Intra-group localization of multiple robots in the map generated by sensing the environment is a supplement to the mapping process. Each robot receives 3D information from multimodal sensors like Laser Range Finders (LRF), Perspective Camera, Inertial Navigation Sensors (INS) and odometers. Fusion of sensor data from these sensors into a world frame is a twofold problem. Initially, the multimodal sensor data is fused into respective robot coordinate frames. Secondly, sensory data of each robot is fused into a global coordinate frame, given the relative poses of multiple robots. The former involves intra-sensor calibration. The latter involves registration of two image inputs from the camera or laser scans in 3D (taken at reference and current positions of the mobile robot), with the help of odometry and orientation from INS. We review the cooperative localization techniques and treat laser registration as a separate problem in localization. We consider obtaining relative poses of multiple robots by fusing vision and laser scan data and discuss their efficiency and robustness. Efforts to generate virtual models of the robot?s surrounding environment in 3D that contain both depth and texture using laser scan maps and vision input is reviewed.

Committee: 

PROFESSOR JIZHONG XIAO, MENTOR, THE CITY COLLEGE OF NEW YORK
PROFESSOR ZHIGANG ZHU, THE CITY COLLEGE OF NEW YORK
PROFESSOR SIMON PARSONS, BROOKLYN COLLEGE
PROFESSOR THEODORE RAPHAN, BROOKLYN COLLEGE
OUTSIDE MEMBER:
PROFESSOR YANA MENG, STEVENS INSTITUTE OF TECHNOLOGY

NEW APPROACHES IN RADAR WAVEFORM DESIGN

Location: 

Room 4421

Speaker: 

DMITRY CHEBANOV

Abstract: 

In this work we discuss two problems that appear in modern radar waveform design. First, we propose a new approach to one of the most challenging problems in this area of research - the construction of waveforms with optimal ambiguity characteristics in a chosen a priori region surrounding the main lobe. Our approach is based on the projection of the signal onto an appropriate orthonormal basis in the space of radar waveforms and approximating the signal with desired ambiguity properties by a finite number of basis waveforms. In particular, we consider well-known Hermite waveforms as the basis functions and discuss the problem of minimizing the volume under the ambiguity surface over a certain given region. The second part of the dissertation is devoted to the problem of the range sidelobe suppression in stepped-frequency LFM pulse trains. Frequency stepping is one of the known techniques employed by modern radars to achieve high range resolution. The main advantage of this approach is that the instantaneous bandwidth of a radar is quite small compared with the total processing bandwidth. This allows the transmission of waveforms with extremely wide overall bandwidth (and, as a consequence, the attainment of high range resolution) without the usage of the expensive hardware needed to support the wide instantaneous bandwidth. We present a systematic approach for designing pulse trains producing the autocorrelation function whose sidelobes are less than some predetermined level.

Committee: 

PROFESSOR IRINA GLADKOVA, MENTOR, THE CITY COLLEGE OF NEW YORK
PROFESSOR OCTAVIO BETANCOURT, THE CITY COLLEGE OF NEW YORK
PROFESSOR TRUONG-THAO NGUYEN, THE CITY COLLEGE OF NEW YORK
PROFESSOR BURTON RANDOL, THE GRADUATE CENTER
PROFESSOR TED BROWN, THE GRADUATE CENTER

OUTSIDE MEMBER:
DR. VADIM POTIEVSKY
STRATEGIC ANALYTICS, AVETA, INC.

SEARCHING MOBILE USERS IN CELLULAR NETWORKS WITH A-PRIORI STATISTICAL KNOWLEDGE OF THEIR WHEREABOUTS

Location: 

Room 4421

Speaker: 

YI FENG

Abstract: 

In cellular phone systems, when a call to a mobile user arrives, it is necessary to locate the user in order to establish the connection. In current systems, the user is paged among all possible cells. Such paging process is conducted in multiple rounds. In each round, a subset of the cells are paged. Once the cell that contains the user is paged, the paging process is completed. Given the limited bandwidth available in cellular networks, an important objective of paging process is to find the mobile user with minimum number of cells being paged. If there is no a-priori knowledge about user location, the best paging strategy is to randomly page the cells in rounds. However, in many cases, cell phone systems are with some a-priori knowledge about the probabilities of a user residing in different cells. Therefore, effective paging strategies should be able to reduce expected number of cells paged, and algorithms that efficiently generate such paging strategies are highly desired. Formally, let a mobile user roam in a zone of N cells, C_1, ..., C_N. Let p_1,..., p_N be the independent probabilities of the user residing in the cells. The user must be found within $D$ rounds. A paging strategy A is an ordered partition of the cells, A_1, ..., A_D, such that in the ith round, the cells in A_i are paged. The paging process terminates once the cell that contains the user is paged. The cost of the paging strategy is the expected number of cells being paged. Let |A_i| be the number of cells in A_i, the paging cost can be formally defined as:

cost(A)=\sum_{d=1}^{D}(|A_d|\sum_{C_j\in{\bigcup_{i=1}^{d}{A_i}}}p_j

There are two performance measurements of algorithms that generate paging strategies - efficiency and optimality. Efficiency is the time complexity for an algorithm generates a paging strategy, and optimality is the paging cost of a strategy generated by an algorithm. In this survey, the problem is introduced with formal definition. Various algorithms are reviewed and analyzed hierarchically in both their efficiency and optimality. Furthermore, the problem is generalized to the version of mutually paging multiple users. In the multiple-user paging problem, the hardness and approximation algorithms to the conference call problem and the yellow page problem are reviewed. In addition, algorithms that solve major paging problems, part of which serves as our future research direction, are reviewed and discussed.

Committee: 

PROFESSOR AMOTZ BARNOY, MENTOR, BROOKLYN COLLEGE
PROFESSOR THEODORE BROWN, GRADUATE CENTER
PROFESSOR STATHIS ZACHOS, BROOKLYN COLLEGE

ARGUMENTATION-BASED REASONING SYSTEMS & MULTI-AGENT DIALOGUES

Location: 

Room 4421

Speaker: 

YUQING TANG

Abstract: 

Building computer systems that can act independently in our best interests while interacting with other humans or computer systems is one trend in the evolution of computer technology. This is the major theme of research in autonomous agent and multiagent systems, one of the most active subfields of artificial intelligence (AI). In this field, these computer systems are referred to as intelligent autonomous agents. This survey will focus on the use of a specific kind of reasoning mechanism — argumentationbased reasoning — to enable the agents to act autonomously, and a specific kind of inter-agent interaction mechanism — argumentation-based dialogues — to enable the agents to interact with each other. Argumentation-based reasoning is a nonmonotonic reasoning mechanism which can resolve the effects of conflicting information in an agent’s information about the world. With an inter-agent dialogue protocol, agents can communicate with each other and apply argumentation-based reasoning mechanism to resolve the conflicts arising from their different views of goals, beliefs, and actions. In this survey, I will investigate different systems of argumentation-based reasoning and different approaches for inter-agent communication and dialogue.

Committee: 

PROFESSOR SIMON PARSONS, MENTOR, BROOKLYN COLLEGE
DISTINGUISHED PROFESSOR SERGEI ARTEMOV, THE GRADUATE CENTER
PROFESSOR SAMIR CHOPRA, BROOKLYN COLLEGE

PIECEWISE PLANAR RECONSTRUCTION FROM RANGE DATA

Location: 

Room 3305

Speaker: 

Gene Yu

Abstract: 

Modern range scanners can capture the geometry of large urban scenes on an unprecedented scale. While the volume of data is overwhelming, urban scenes can be approximated well by parametric surfaces such as planes. Piecewise planar representation can reduce the size of the data dramatically, and it is ideal for rendering and other high-level applications. We propose a method for extracting a piecewise planar representation from a collection of range images. Central to our approach is a novel segmentation algorithm that extracts a piecewise planar function from an individual range image. Many existing methods for large datasets utilize local surface fitting for speed, at the expense of verifiably planar output components. Our novel framework unifies local and global fitting to guarantee truly planar components. By concentrating on the segmentation problem early in the processing pipeline, our goal is to simplify subsequent 3D modeling and rendering stages by replacing raw data with high-level 2.5D primitives. To demonstrate the effectiveness of our approach, we propose a method for converting the segmentation results into a triangle mesh suitable for rendering using a standard 3D graphics engine.

Committee: 

PROFESSOR, GEORGE WOLBERG MENTOR, THE CITY COLLEGE OF NEW YORK
PROFESSOR MICHAEL D. GROSSBERG, THE CITY COLLEGE OF NEW YORK
PROFESSOR IOANNIS STAMOS, HUNTER COLLEGE

VNLAYER BASED IPV6 ROUTING PROTOCOLS FOR MOBILE AD HOC NETWORKS

Location: 

Room 4421

Speaker: 

JIANG WU

Abstract: 

The Virtual Node Layer [1] (VNLayer) is a programming abstraction for a Mobile Ad-Hoc Network (MANET), defining virtual servers at fixed locations of the geographical area covered by the network. The fixed virtual servers are emulated by a subset of the mobile nodes. The advantage of this abstraction is that simple wireline protocols can be deployed on the infrastructure to tame the difficulties inherent in the MANET setting. A disadvantage is the overhead of implementing the Virtual Node Layer. We designed an ns-2 [4] based simulation package (VNSim) for the VNLayer. The implementation of the VNLayer uses a leader-based state replication strategy to emulate the virtual nodes. A VNLayer based address allocation algorithm based on DHCP [5] is implemented and evaluated using VNSim. The simulation results show that VNSim can efficiently simulate a virtual node system with up to a few hundred mobile nodes; that the VNLayer overhead is quite small; and that the address allocation protocol performs well in a small network of 16 regions with 40 to 120 mobile nodes and a larger network of 64 regions with 160 mobile nodes. Rate of motion for the mobile nodes varies from quite slow to quite fast. Topology based or geographically based IPv6 routing protocols can be deployed on the overlay network of fixed virtual nodes, creating a hierarchical MANET routing system. In such a system, the clean separation between hosts and routers eliminates the need for extensive changes to the hosts. The use of a fixed overlay network can prevent excessive message flooding. Extending earlier ns-2 based simulation studies of such virtual node systems, we propose to investigate the performance and the failure modes of virtual node based IPv6 routing protocols in MANETs. Simple IPv6 routers will be implemented in VNSim. The message overhead and routing performance of various virtual node based IPv6 routing protocols will be evaluated and compared with standard MANET routing protocols like AODV and a simple VNLayer based routing protocol, geocast.

Committee: 

PROFESSOR NANCY GRIFFETH, MENTOR, LEHMAN COLLEGE
PROFESSOR AMOTZ BARNOY, BROOKLYN COLLEGE
PROFESSOR BILAL KHAN, JOHN JAY COLLEGE
PROFESSOR PING JI, JOHN JAY COLLEGE

ON THE PROBLEM OF PACKING STEINER TREES OF A GRAPH

Location: 

Room 4430

Speaker: 

MOHAMMAD TALAFHA

Abstract: 

In this talk, we are concerned with undirected graphs G = (V, E), with distinguished set of vertices K  V, K 2 called the terminal vertices. A K-Steiner tree T of G is a minimal tree, containing all the vertices of K, and any vertex of degree 1 in T must belong to K. The K-edge connectivity, denoted as K(G), of a connected graph G with terminal vertices , is the minimum number of edges whose removal disconnects at least two vertices of K. The degree of a vertex v of G is the number of edges incident at v in G. In this talk, we will investigate the relationship between the maximum number of edge-disjoint K-Steiner trees, tK (G), and the K-edge-connectivity of a graph G. This problem known as the “Steiner tree packing problem” (STPP for short) has attracted considerable attention from researchers in different areas because of its wide applicability. It has applications in routing problems arising in VLSI circuit design, where an effective way of sharing different signals amongst cells in a circuit can be achieved by the use of edge-disjoint Steiner trees. In this talk we are interested in finding a lower bound of the number of edge-disjoint K-Steiner trees, tK(G) with respect to the K-edge-connectivity, K(G), and in particular we present the most recent results pertaining this problem.

Committee: 

PROFESSOR LOUIS PETINGI, MENTOR, COLLEGE OF STATEN ISLAND
PROFESSOR TED BROWN, THE GRADUATE CENTER
PROFESSOR JANOS PACH, THE CITY COLLEGE
PROFESSOR CHRISTINA ZAMFIRESCU, HUNTER COLLEGE

Syndicate content