Similarity Metrics for SQL Query Clustering

Gokhan Kul; Duc Thanh Anh Luong; Ting Xie; Varun Chandola; Oliver Kennedy; Shambhu Upadhyaya

doi:10.1109/TKDE.2018.2831214

Back

Similarity Metrics for SQL Query Clustering

Journal article

Open access

Peer reviewed

Similarity Metrics for SQL Query Clustering

Gokhan Kul, Duc Thanh Anh Luong, Ting Xie, Varun Chandola, Oliver Kennedy and Shambhu Upadhyaya

IEEE transactions on knowledge and data engineering, Vol.30(12), pp.2408-2420

12/01/2018

DOI: https://doi.org/10.1109/TKDE.2018.2831214

Abstract

Benchmark testing

Clustering

Indexes

Measurement

query logs

Security

similarity metric

summarization

Task analysis

Tuning

Database access logs are the starting point for many forms of database administration, from database performance tuning, to security auditing, to benchmark design, and many more. Unfortunately, query logs are also large and unwieldy, and it can be difficult for an analyst to extract broad patterns from the set of queries found therein. Clustering is a natural first step towards understanding the massive query logs. However, many clustering methods rely on the notion of pairwise similarity, which is challenging to compute for SQL queries, especially when the underlying data and database schema is unavailable. We investigate the problem of computing similarity between queries, relying only on the query structure. We conduct a rigorous evaluation of three query similarity heuristics proposed in the literature applied to query clustering on multiple query log datasets, representing different types of query workloads. To improve the accuracy of the three heuristics, we propose a generic feature engineering strategy, using classical query rewrites to standardize query structure. The proposed strategy results in a significant improvement in the performance of all three similarity heuristics.

Files and links (1)

url

https://doi.org/10.1109/TKDE.2018.2831214View

Published (Version of record) Open

Metrics

5 Record Views

See more details

Details

Title: Similarity Metrics for SQL Query Clustering
Creators: Gokhan Kul - University at Buffalo, State University of New York
Duc Thanh Anh Luong - University at Buffalo, State University of New York
Ting Xie - University at Buffalo, State University of New York
Varun Chandola - University at Buffalo, State University of New York
Oliver Kennedy - University at Buffalo, State University of New York
Shambhu Upadhyaya - University at Buffalo, State University of New York
Publication Details: IEEE transactions on knowledge and data engineering, Vol.30(12), pp.2408-2420
Publisher: IEEE
Number of pages: 13
Grant note: CNS-1409551 / US National Science Foundation
Academic Unit: Department of Computer and Information Science
Language: English
Resource Type: Journal article
DOI: https://doi.org/10.1109/TKDE.2018.2831214
Record Identifier: 9914419510501301

Similarity Metrics for SQL Query Clustering

Abstract

Files and links (1)

Related links

Metrics

Details