Demystifying SQL Retrieval Queries A Step-by-Step Breakdown

How Is SQL Query Retrieved, and Conceptually, What Is The Order oF Executing Each of The Six Clauses?

SQL or the Structured Query Language described by Bowman, Emerson, & Darnovsky (1997) was proposed in 1970 by Dr. E. F. Codd at IBM and developed the SQL language over the next decade at research facilities and universities. IBM announced the first commercial SQL production database in 1981, called SQL/DS.

The database management system (DBMS) is a complex application software. Before discussing conceptually how an SQL retrieval query is executed by specifying the conceptual order of executing each of the six clauses, let's understand how a DBMS processes SQL queries at a high level.

Elmasri & Navathe (2015) describe the technique used by the DBMS to process high-level SQL queries as first scanning, parsing, and then validating. The DBMS scanner identifies query tokens. Examples of tokens are SQL keywords, attribute names, and relation names in the SQL query. Tokens identified by the DBMS parser will analyze tokens and check the query syntax to decide if the query is written according to the documented syntax rules defined by the SQL query language.

A query is inspected to be sure all attributes and relation names are valid and related to names in the schema of the database being queried. Then the DBMS internal representation of the SQL query is created. This is usually achieved through a tree data structure known as a query tree. The DBMS must then generate an execution strategy referred to as an SQL query plan that will retrieve the query results from the database files. Finally, the SQL query passes through the DBMS query optimizer, and it chooses the best strategy for access to the data.

SQL query retrieval, described by Elmasri & Navathe (2015), can consist of up to six clauses.

The clauses are SELECT, FROM, WHERE, GROUP BY, HAVING and ORDER BY. Not all six clauses are required in each query.

The authors discuss that the first two clauses, the SELECT and FROM, are mandatory, which is true in most database management systems. A database such as PostgreSQL, SQL Server, MySQL, H2, HSQLDB, and SQLite can do a SELECT without the FROM clause as stated by SQL Workbench (2016). For example, a query such as selecting ‘Hello World!’ would return Hello World in PostgreSQL.

The SELECT statement lists the attribute(s) or function(s) one wants to be retrieved. If the attribute is an asterisk in the attribute list, this translates to selecting all the rows in the table as part of the result set. The best way to avoid application coding issues is to define all attributes. Errors in occur when the asterisk is used in the result set and a new column is added to a table. That is if the application is not updated too. It could cause errors to occur in the application code after a creating new column. It is best to name the columns and separate them by a comma. An attribute can result in header name is renamed by using the AS clause. Using the AS clause can be handy when one wants to rename the column of the output query to something else or if the column is derived and doesn’t have a name.

The relations are identified in the FROM clause. These could be tables, views or materialized views for example. The relation(s) will be where the data is accessed from. Relations will hold data that can be queried with SQL.

SQL statements that are optional are the WHERE, GROUP BY, HAVING and ORDER BY clauses. The WHERE clause narrows the result set by defining specific conditions so that the results set of tuples from these relations will be narrowed based on the scope. The WHERE clause may also include join conditions for some queries that are written. A join connects relations in different ways.

The GROUP BY and HAVING clauses are used for aggregation or summarizing the query's result set. So these clauses may not be needed in all cases. Lastly, the ORDER BY clause allows the result set to be displayed in a particular order. The ORDER BY clause can assist in sorting the data in order of importance to the end user expects to see the data. In other words, display the data with the most import data at the top of the query for quick and easy access

References

Bowman, J. S., Emerson, S. L., & Darnovsky, M. (1997). The practical SQL handbook: using structured query language. Reading, MA: Addison-Wesley Developers Press. (https://archive.org/details/practicalsqlhand00bowm)

Elmasri, R., Navathe, S. B. (2015-06-01). Fundamentals of Database Systems, 7th Edition [VitalSource Bookshelf version]. Retrieved from (https://bookshelf.vitalsource.com/books/9781323139899)

SQL Workbench (2016) SQL Feature Comparison. Retrieved From (https://www.sql-workbench.eu/dbms_comparison.html)

Posts in this series