Abstract

Data is at the core of Information Systems (IS) curriculum. Teaching IS courses involves an evolving set of techniques, methods, and algorithms but datasets continue to play a critical and essential role. Using the same dataset in multiple class modules provides continuity and helps students focus on the newly introduced concepts and techniques while relying on their understanding of a familiar dataset. This practice is common for individual courses but rarely extends across multiple courses in IS. Considering common datasets in courses from different departments is exceedingly rare. We would like to find datasets that can be used across all courses in the major, and then in multiple majors within the business school. As a first step to understanding the underlying reasons for the lack of usage of common datasets in IS and business school courses in general, we present an extensible framework of dataset characteristics. In this paper, we consider the two most important general characteristics of datasets, size (combination of cardinality and width) and data types, propose a broad categorization of these characteristics, and summarize the dataset usage in our core IS courses. The framework can be extended with other important characteristics such as quality (clean, dirty) and origin (synthetic, real-world). We categorize the size of datasets as toy (which can be shown on a single page or screen), small (up to a few hundred records), medium (up to a few thousand records), and large (tens of thousands or more records). The data types of individual fields fall into the following four general categories: quantitative (Boolean, integer, real), categorical (string, text), calendar (timestamp, date, year) and geographical (address, GPS coordinates).

Share

COinS