Sequential Nonparametric Clustering of Data Streams with Composite Distributions
Date11th Mar 2022
Time03:00 PM
Venue Google Meet
PAST EVENT
Details
We study a sequential nonparametric clustering problem to group a finite set of S data streams into K clusters. Each data stream is a real-valued i.i.d data sequences generated from an unknown continuous distribution. The distributions themselves are organized into clusters according to their proximity to each other based on a certain distance metric. We propose a class of universal sequential nonparametric clustering tests for the cases when K is known and when K is unknown. We show that the proposed tests stop in finite time almost surely and are universally exponentially consistent. We also bound the asymptotic growth rate of the expected stopping time as probability of error goes to zero. Our results generalize earlier work on sequential nonparametric anomaly detection to the more general sequential nonparametric clustering problem, providing a new test for case of anomaly detection where the anomalous data streams can follow distinct probability distributions. Simulations show that each of our proposed sequential clustering tests outperform corresponding fixed sample size tests and are advantageous in the anomaly detection problems with distinct anomalies.
Speakers
Sreeram C Sreenivasan (EE17D404)
Electrical Engineering