Hello. This is one of those posts in The Morning Paper deserving a post in The Information Age. I must say that blogging can be a frustrating task, and I’ve not been doing it always right. But I believe in improvement, learning and pushing of standards on state of the art to achieve the best possible. With Blogs such as The Morning Paper you are assured to be in the right path. Always.
If you’re a DBA responsible for figuring out what’s going on, this presents quite a challenge. You might be awash in stats and graphs (MySQL maintains 260 statistics and variables for example), but still sorely lacking the big picture… “as a consquence, highly-skilled and highly-paid DBAs (a scarce resource themselves) spend many hours diagnosing performance problems through different conjectures and manually inspecting various queries and log files, until the root cause is found.”
DBSherlock (available at http://dbseer.org) is a performance explanation framework that helps DBAs diagnose performance problems in a more principled manner. The core of the idea is to compare a region of interest (an anomaly) with normal behaviour by analyzing past statistics, to try and find the most likely causes of the anomaly. The result of this analysis is one or both of:
- A set of concise predicates describing the combination of system configurations or workload characteristics causing the performance anomaly. For example, when explaining an anomaly caused by a network slowdown:
network_send < 10KB ∧ network_recv < 10KB ∧ client_wait_times > 100ms ∧ cpu_usage < 5
- A high-level diagnosis based on existing causal models in the system. An example cause presented heremight be “Rotation of the redo log file.”
DBSherlock is integrated as a module in DBSeer, an open-source suite of database administration tools for monitoring and predicting database performance. This means DBSherlock can rely on DBSeer’s API for collecting and visualizising performance statistics. At one-second intervals, the following data is collected:
- OS resource consumption statistics
- DBMS workload statistics
- Timestamped query logs
- Configuration parameters from the OS and DBMS
From the raw data, DBSeer computes aggregate statistics about transactions executed during each time interval, and aligns them with the OS and DBMS statistics according to their timestamps. It is these resulting time series that DBSherlock using as the starting point for diagnosis.
Starting from a given abnormal region, and a normal region, the aim is to find predicates that can separate them as cleanly as possible. The separation power of a predicate is defined as the difference between percentage of abnormal tuples that satisfy it, and the percentage of normal tuples that satisfy it.
Identifying predicates with high separation power is challenging. First, one cannot find a predicate of high separation power by simply comparing the values of an attribute in the raw dataset.This is because real-world datasets and OS logs are noisy and attribute values often fluctuate regardless of the anomaly. Second, due to human error, users may not specify the boundaries of the abnormal regions with perfect precision. The user may also overlook smaller areas of anomaly, misleading DBSherlock to treat them as normal regions. These sources of error compound the problem of noisy datasets. Third, one cannot easily conclude that predicates with high separation power are the actual cause of an anomaly. They may simply be correlated with, or be symptoms themselves of the anomaly, and hence, lead to incorrect diagnoses.
This is top-notch research on Database Administration with the best of breed technology currently available. I highlight the reference to advanced concepts in Computer Science and the important task of discretization of time series in database systems.
Our filtering strategy aims to separate the Normal and Abnormal partitions that are originally mixed across the partition space (e.g., due to the noise in the data or user errors). If there is a predicate on attribute Attr, that has high separation power, the Normal and Abnormal partitions are very likely to form well-separated clusters after the filtering step. This step mitigates some of the negative effects of noisy data or user’s error, which could otherwise exclude a predicate with high separation power from the output.
I’m pretty sure our old friends Sherlock Holmes and Dr. Watson would be amazed and inspired by the level of development and sophistication of twenty-first Century information retrieval systems. Indeed not that quite elementary, dear Watson.
Dig deeper in: