Abstract

User defined functions (UDFs) are frequent components of SQL queries and data processing workflows (DPWs). In both of these applications, UDFs are often available as black boxes, i.e., their semantics and performance characteristics are unknown (such functions are further called BBUDFs). This feature prevents from optimizing execution plans of queries and from optimizing the whole DPWs. Discovering the semantics of a BBUDF is often impossible due to high complexity of its code. On the contrary, discovering its performance model seems to be feasible with the support of machine learning. In this paper, we present a solution for classifying BBUDFs into performance classes. This way, if a performance class of a given BBUDF is known, it may allow to reason about some hidden features of the BBUDF. Our solution is supported by experimental evaluation, which reveals that our initial approach, in multiple cases, allows to classify BBUDFs to adequate performance classes.

Recommended Citation

Bodziony, M., Ciesielski, B., Lehnhardt, A. & Wrembel, R. (2024). On Reasoning About Black-Box Udfs by Classifying their Performance Characteristics. In B. Marcinkowski, A. Przybylek, A. Jarzębowicz, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Harnessing Opportunities: Reshaping ISD in the post-COVID-19 and Generative AI Era (ISD2024 Proceedings). Gdańsk, Poland: University of Gdańsk. ISBN: 978-83-972632-0-8. https://doi.org/10.62036/ISD.2024.83

Paper Type

Full Paper

DOI

10.62036/ISD.2024.83

Share

COinS
 

On Reasoning About Black-Box Udfs by Classifying their Performance Characteristics

User defined functions (UDFs) are frequent components of SQL queries and data processing workflows (DPWs). In both of these applications, UDFs are often available as black boxes, i.e., their semantics and performance characteristics are unknown (such functions are further called BBUDFs). This feature prevents from optimizing execution plans of queries and from optimizing the whole DPWs. Discovering the semantics of a BBUDF is often impossible due to high complexity of its code. On the contrary, discovering its performance model seems to be feasible with the support of machine learning. In this paper, we present a solution for classifying BBUDFs into performance classes. This way, if a performance class of a given BBUDF is known, it may allow to reason about some hidden features of the BBUDF. Our solution is supported by experimental evaluation, which reveals that our initial approach, in multiple cases, allows to classify BBUDFs to adequate performance classes.