Dynamic Tracking and Screening in Massive Datastreams

Jingshen Wang, Lilun Du, Changliang Zou, Zhenke Wu (2020). Submitted.


Modern technological advances have expanded the scope of applications requiring analysis of large-scale datastreams that comprise multiple indefinitely long time series. There is an acute need for statistical methodologies that perform online inference and continuously revise the model to reflect the current status of the underlying process. In particular, we are interested in constructing a large-scale dynamic tracking and screening (DTS) procedure capable of rapidly identifying irregular individual streams whose behavioral patterns deviate from the majority. By fully exploiting the sequential feature of data streams, we first develop a robust estimation approach under a framework of varying coefficient model. The procedure naturally accommodates unequally-spaced design points and updates the coefficient estimates as new data arrive without the need to store historical data. A data-driven choice of an optimal smoothing parameter is accordingly proposed. Further, we suggest a new online model-specification test tailored to the streaming environment. The resulting DTS scheme is able to adapt time-varying structures appropriately, track changes in the underlying models, and hence maintain high accuracy in detecting time periods during which individual streams exhibit irregular behavior. Moreover, we derive the asymptotic properties of the procedure and investigate its finite-sample performance through simulation studies. We demonstrate the proposed methods through a mobile health example to estimate the timings when subjects’ sleep and physical activities have unusual influence upon their mood.

Keywords Change point; Consistency; Exponentially weighted moving average, Kernel smoothing, Varying coefficient