@Scale 2019: Streaming, flexible log parsing with real-time applications
Logs from cybersecurity appliances are numerous, generated from heterogeneous sources, and frequently victim to poor hygiene and malformed content. Relying on an already understaffed human workforce to constantly write new parsers, triage incorrectly parsed data, and keep up with ever-increasing data volumes is bound to fail. Using RAPIDS, an open source data science platform, Bartley conveys how creating a more flexible, neural network approach to log parsing can overcome these obstacles. He presents an end-to-end workflow that begins with raw logs, applies flexible parsing, and then applies stream analytics (e.g., rolling z-score for anomaly detection) to the near real-time parsing. By keeping the entire workflow on GPUs (either on premises or in a cloud environment), he demonstrates near real-time parsing and the ability to scale to large volumes of incoming logs.