Paper Type
Research-in-Progress Paper
Abstract
Referential integrity ensures the consistency of data between database relations. The SQL standard proposes different semantics to deal with partial information under referential integrity. Simple semantics neglects tuples with nulls, and enjoys built-in support by commercial database systems. Partial semantics does check tuples with nulls, but does not enjoy built-in support. We investigate this mismatch between the SQL standard and real database systems. Indeed, insight is gained into the trade-off between cleaner data under partial semantics and the efficiency of checking simple semantics. The cost for referential integrity checking is evaluated for various dataset sizes, indexing structures and degrees of cleanliness. While the cost of partial semantics exceeds that of simple semantics, their performance trends follow similar patterns under growing database sizes. Applying multiple index structures and exploiting appropriate validation mechanisms increase the efficiency of checking partial semantics.
Recommended Citation
Link, Sebastian and Memari, Mozhgan, "Static Analysis of Partial Referential Integrity for Better Quality SQL Data" (2013). AMCIS 2013 Proceedings. 1.
https://aisel.aisnet.org/amcis2013/DataQuality/RoundTablePresentations/1
Static Analysis of Partial Referential Integrity for Better Quality SQL Data
Referential integrity ensures the consistency of data between database relations. The SQL standard proposes different semantics to deal with partial information under referential integrity. Simple semantics neglects tuples with nulls, and enjoys built-in support by commercial database systems. Partial semantics does check tuples with nulls, but does not enjoy built-in support. We investigate this mismatch between the SQL standard and real database systems. Indeed, insight is gained into the trade-off between cleaner data under partial semantics and the efficiency of checking simple semantics. The cost for referential integrity checking is evaluated for various dataset sizes, indexing structures and degrees of cleanliness. While the cost of partial semantics exceeds that of simple semantics, their performance trends follow similar patterns under growing database sizes. Applying multiple index structures and exploiting appropriate validation mechanisms increase the efficiency of checking partial semantics.