Dynamic Statistical Inference in Massive Datastreams

Jingshen Wang, Lilun Du, Changliang Zou, Zhenke Wu (2023+). Submitted.

Abstract

Modern technological advances have expanded the scope of applications requiring analysis of large-scale datastreams that comprise multiple indefinitely long time series. There is an acute need for statistical methodologies that perform online inference and continuously revise the model to reflect the current status of the underlying process. In this manuscript, we propose a dynamic statistical inference framework–named dynamic tracking and screening (DTS)–that is not only able to provide accurate estimates of the underlying parameters in a dynamic statistical model, but also capable of rapidly identifying irregular individual streams whose behavioral patterns deviate from the majority. Concretely, by fully exploiting the sequential feature of datastreams, we develop a robust estimation approach under a framework of varying coefficient model. The procedure naturally accommodates unequally-spaced design points and updates the coefficient estimates as new data arrive without the need to store historical data. A data-driven choice of an optimal smoothing parameter is accordingly proposed. Furthermore, we suggest a new multiple testing procedure tailored to the streaming environment. The resulting DTS scheme is able to adapt time-varying structures appropriately, track changes in the underlying models, and hence maintain high accuracy in detecting time periods during which individual streams exhibit irregular behavior. Moreover, we derive rigorous statistical guarantees of the procedure and investigate its finite-sample performance through simulation studies. We demonstrate the proposed methods through a mobile health example to estimate the timings when subjects’ sleep and physical activities have unusual influence upon their mood.

Keywords Change point; Consistency; Kernel smoothing; Multiple testing; Varying coefficient