Day 9

Selecting Data with SQL

Today is your first lesson in Structured Query Language (SQL). SQL is a powerful manipulation language used by Visual Basic and the Microsoft Access JET database engine as the primary method for accessing the data in your databases. SQL statements fall into two broad categories: data manipulation language statements (DML) and data definition language statements (DDL). The DDL statements enable you to define data tables, indexes, and database relationships. DML statements are used to select, sort, summarize, and calculate the information stored in the data tables.

Today, you will learn about the DML statements. When you complete this chapter, you will be able to use SQL statements to construct database queries that can be retrieved, and you will be able to reorder data in any format recognized by Visual Basic. Even better, because SQL is used in almost all relational database systems (SQL Server, Oracle, Gupta, and so on), you will also be able to apply the knowledge you gain here in almost any other relational database environment you might encounter in the future.

In this lesson, you will learn how to use the SELECT...FROM statement to select data from one or more tables and present that information in a single table for update or review. You will also learn how to limit the data you select to only the records that meet your criteria using the WHERE clause. You'll learn how to easily reorder the data in tables using the ORDER BY clause. You will also learn how to create simple statements that automatically summarize and total the data using the GROUP BY...HAVING clause.

You will learn about typical SQL functions to manipulate numbers and strings. This lesson also covers advanced DML statements such as PARAMETERS, UNIONS, JOINS, and TRANSFORM...PIVOT .

Today, you will create actual SQL queries (and in some cases, store them for later use) using the Visual Basic Visdata program you learned about on Day 8.

What Is SQL?

Before jumping into specific SQL statements and their use, you should understand the definition of SQL and its uses and origins. SQL stands for Structured Query Language. It was developed in the 1970s at IBM as a way to provide computer users with a standardized method for selecting data from various database formats. The intent was to build a language that was not based on any existing programming language, but could be used within any programming language as a way to update and query information in databases.


NOTE:

The word SQL should be pronounced ess-que-ell instead of sequel. The confusion about the pronunciation of the word stems from the database language's origin. The SQL language is a predecessor of a language called Sequel developed by IBM in the late 1960s. For this reason, many (especially those familiar with IBM's Sequel language) continued to pronounce the new database language improperly.


SQL statements are just that—statements. Each statement can perform operations on one or more database objects (tables, columns, indexes, and so on). Most SQL statements return results in the form of a set of data records, commonly referred to as a view. SQL is not a particularly friendly language. Many programs that use SQL statements hide these statements behind point-and-click dialogs, query-by-example grids, and other user-friendly interfaces. Make no mistake, however, that if the data you are accessing is stored in a relational database, you are using SQL statements, whether you know it or not.

ANSI Standard SQL Versus Microsoft JET SQL

SQL syntax is determined by a committee that is part of the American National Standards Institute (ANSI). The ANSI-SQL committee is made up of information system professionals who take on the job of establishing and enforcing standards on the rapidly moving computer programming industry. Although each computer programming language and database interface has its own unique versions of SQL, nearly everyone has agreed to adhere to the basic standards defined by the ANSI-SQL committee. The most widely used SQL standard is SQL-89. This standard was first promulgated in 1989. An updated set of standards (SQL-92) was developed three years later.

Within each set of SQL standards, there are three levels of compliance. A database product must meet Level I compliance in order to call itself an SQL-compatible product. Levels II and III are optional levels of compliance that products can also attain in order to increase interoperability among database systems.

The Microsoft JET database engine that is used to process all Visual Basic SQL statements is ANSI SQL-89 Level I compliant. There are very slight differences between ANSI SQL-89 and Microsoft JET SQL at Level II and Level III. We won't dwell on these differences here. Those who are interested in learning more about ANSI SQL standards and Microsoft JET compliance can find additional documentation elsewhere. The lessons in this book focus strictly on the Microsoft JET SQL syntax. Be assured that once you master the concepts covered here, you will be able to use the same skills in almost all SQL-based programming and query tools you encounter.

SQL Basics

Now it's time to start building SQL statements. If you haven't already done so, load the Visual Basic Visdata application you learned about on Day 8. Using Visdata, load the BOOKS.MDB that is included in the \TYSDBVB\CHAP09 directory of the CD that ships with this book. You will use this database for most of today's lesson.


NOTE:

This book shows reserved SQL words in uppercase letters (for example, SELECT). This is not required by Visual Basic, but it is a good programming habit.


The SELECT...FROM Statement

In this section, you will learn about the most commonly used SQL statement, the SELECT...FROM statement. The SELECT...FROM statement lets you pick records from one or more tables in a database. The results of a SELECT...FROM statement are returned as a view. This view is a subset of the source data. In Visual Basic, the view can be returned as a recordset table, dynaset, or snapshot. Because today's lesson focuses on getting results you can display, views will be returned as Visual Basic snapshot data objects.

In its simplest form, a SELECT...FROM statement contains two parts:


NOTE:

Standard SQL syntax uses the word column to describe a field and row to describe a record. This book uses the terms column or field and row or record interchangeably.


A simple example of a valid SQL statement is





SELECT Au_ID FROM Authors

This SQL statement tells the Microsoft JET database engine to return a data object that contains the Au_ID from the Authors table. Enter this SQL statement into the Visdata SQL window and click the Execute button to see the returned result set. Your screen should look similar to the one in Figure 9.1.

Figure 9.1

The result set from the first SELECT statement.

As you can see from the result set, the SELECT...FROM statement returns all the rows in the table. Whether the table contains 10 or 10,000 records, you can get a complete result set with just one SELECT...FROM statement. This is quite handy, but it can also be quite dangerous. If the result of your SELECT...FROM statement contain too many records, you can slow down the network, possibly run out of memory on your local workstation, and eventually lock up your PC. Later in this chapter, you will learn how to use the WHERE clause to limit the size of your view to only those records you need.

To return all the columns from a table, you can list each column in the SELECT statement. This works if you have only a few columns in the table. However, if you have several columns, it can become quite tedious. There is a shortcut. To automatically list all columns in the table in your result set, instead of typing column names, you can type the asterisk (*). The asterisk tells SQL to return all columns in the requested table. The SELECT statement to display all columns of the Author table would look like this:





SELECT * FROM Authors

Enter the preceding SELECT statement into the Visdata SQL window and review the results. Your screen should look like the one in Figure 9.2.

Figure 9.2

The results of the SELECT * query.

Notice that even though you listed no fields in your SELECT statement, all fields were returned in the result set. This is very useful when you want to display a data table but do not know names of all the columns. As long as you know a valid table name, you can use the SELECT...FROM statement to display the entire table.

The order in which you list columns in the SELECT...FROM statement controls the order in which they are displayed in the result set. Figure 9.3 shows the results of the following SELECT...FROM statement:





SELECT Authors, Au_ID FROM Authors

Figure 9.3

Using the SELECT...FROM statement to change column display order.

The ORDER BY Clause

When you use the SELECT...FROM statement, the records returned in the result set are returned in the order that they were found in the underlying table. But what if you wanted to display the results of your SELECT...FROM statement in a specialized sorted order? You can use the ORDER BY clause to do just that.

Placing ASC or DESC after each field in the ORDER BY clause indicates the order in which you want to sort the column, ascending or descending. If no order is supplied, SQL assumes that you want the set sorted in ascending order.

The following SQL example shows how you can display the records in the Authors table in descending sorted order, by Author Name.





SELECT * FROM AUTHORS ORDER BY Author DESC

Enter this statement in the SQL window of Visdata and execute it. Compare your results to Figure 9.4.

Figure 9.4

The results of the descending ORDER BY clause.

You can enter more than one field in the ORDER BY clause. SQL will create a result set that reflects the aggregate sort of the ORDER BY clause. Using Visual Basic Visdata, enter and execute the following SELECT...FROM statement. Compare your results to those in Figure 9.5.





SELECT State, City FROM Publishers ORDER BY State DESC, City ASC

Figure 9.5

The multiple-column ORDER BY clause.

Notice in the example shown in Figure 9.5 that you have combined the ability to alter the row order of the data in the result set with the ability to alter the column order of the data in the result set. These are powerful tools. Now that you know how to use SQL to display complete, single-data tables, you can learn how to limit the result set to only those records you need.

The WHERE Clause

One of the most powerful aspects of the SELECT...FROM statement is its capability to control the content of the result set using the WHERE clause. There are two ways to use the WHERE clause to control the content of the result set:

Using WHERE to Limit the Result Set

The WHERE clause enables you to perform logical comparisons on data in any column in the data table. In its simplest form, the WHERE clause consists of the following:





WHERE column = value

In this line, column represents the name of the column in the requested data table, and value represents a literal value such as NY or Smith. It is important to know that the WHERE clause is always preceded by a SELECT...FROM statement. Use Visdata to enter and execute the following SQL statement, and compare your results to those in Figure 9.6.





SELECT Name, State FROM Publishers



   WHERE State = 'CA'

TIP:

This book uses the single quote marks (') around string literals within SQL statements. Visual Basic SQL accepts both single and double quote marks within SQL. Because you will often be building SQL statements in Visual Basic code, using single quotes marks within SQL statements makes it easier to construct and maintain SQL statements as Visual Basic strings.


Figure 9.6

The results of a simple WHERE clause SQL query.

The previous SQL statement returns a subset of the data in the result set. That is, the resulting view does not contain all of the rows of the Publishers table. Only those rows that have columns meeting the WHERE clause criteria will be returned in the result set.

You can link WHERE clauses using the AND and OR operators. Enter and execute the following SQL statement, and compare your results to Figure 9.7.





SELECT Name, State, City FROM Publishers



   WHERE State = 'CA' AND City <> 'Redwood City'

Figure 9.7

The results of a complex WHERE clause.

You can use several AND and OR operators to link valid logical comparisons together to form a single WHERE clause. You can also use more than just =, <>, >, <, <=, and >= logical comparisons. Visual Basic SQL supports the use of BETWEEN...AND, IN, and LIKE comparisons. The following SQL statement illustrates the use of BETWEEN...AND in a WHERE clause. Check your results against those shown in Figure 9.8





SELECT PubID, Name, State, City FROM Publishers



   WHERE PubID BETWEEN 10 AND 15

Figure 9.8

Using BETWEEN...AND in a WHERE clause.

The result set will only contain rows that have a PubID value between 10 and 15. Note that the values listed in the BETWEEN...AND clause (10 and 15) are included in the result set.

You can also use SQL to return a result set that contains rows that match a set of noncontiguous data. For example, if you wanted a list of all the publishers in the state of New York, California, and Alaska, you could use the IN keyword followed by the desired values, separated by commas, within parenthesis, as part of the WHERE clause. Enter and execute the following SQL statement, and check your result against those shown in Figure 9.9.





SELECT PubID, Name, City, State FROM Publishers



   WHERE State IN ('NY','CA','AK')

Figure 9.9

Using the IN keyword in the WHERE clause.

You can also use the LIKE function to return all rows whose column's contents are similar to the literals passed in the function. For example, to return all rows with a State column that has the letter A in any position, you would use the following SQL SELECT...FROM statement(see Figure 9.10 for results):





SELECT PubID, Name, City, State FROM Publishers



   WHERE State LIKE('*A*')

Figure 9.10

Using the LIKE function in a WHERE clause.

The LIKE function is a very powerful tool. It is covered in more depth in the next section of today's lesson, "SQL Functions."

Using WHERE to Link Two or More Tables in a Result Set

You can use the WHERE clause to compare columns from different tables. In doing so, you can set up criteria that can link two or more tables in a single result set. The syntax for this form of the WHERE clause is





SELECT table1.columnA, table2.columnA FROM table1, table2



WHERE table1.columnA = table2.columnA

table1 and table2 are different data tables in the same database. columnA represents a single column in each of the tables. Use Visdata to enter and execute the following SQL statement. Compare your result set to the one in Figure 9.11.





SELECT Titles.Title, Publishers.Name



    FROM Publishers, Titles



   WHERE Publishers.PubID =Titles.PubID

Figure 9.11

Using the WHERE clause to link two tables in a single result set.

The preceding SQL statement creates a result set that displays the book title and publisher's name. This is accomplished using the WHERE clause to tell SQL to select only those rows where the PubID values in each table match up. Keep in mind that this is done without any programming code, special indexing, or sorting commands. SQL handles all those tasks for you. Also, there are a few new items in this SQL statements that bear further review.

This is the first SQL statement you have encountered today that lists columns from two different tables. When selecting columns from more than one table, it is good programming practice to precede the column name with the table name and join the two with the period (.). As long as the column name is unique among all columns in the tables from which you are selecting, SQL does not require you to use the table.column syntax. But it is a good habit to do so, especially when you are building SQL statements in Visual Basic code.

You should also notice that the WHERE clause comparison columns (Publishers.PubID and Titles.PubID) were not included in the SELECT portion of the statement. You do not have to include the column in the SELECT portion of the statement to use it in the WHERE portion of the statement, as long as the column already exists in the underlying table.

Combining tables using the WHERE clause will always return a nonupdateable result set. You cannot update the columns in a view created in this manner. If you want to link tables together and also be able to update the underlying tables for that view, you need to use the JOIN clause, which is covered later today.

You can combine the link-type and limit-type versions of the WHERE clause in a single SQL SELECT...FROM statement. Execute the following statement and compare your results to those in Figure 9.12.





SELECT Titles.PubID,Titles.Title,Publishers.Name



   FROM Titles, Publishers



   WHERE Titles.PubID = Publishers.PubID



      AND Publishers.PubID BETWEEN 10 AND 15

Figure 9.12

Combining link-type and limit-type WHERE clauses.

The preceding SQL statement selects only those records in which the PubID columns match and the PubID values are between 10 and 15.

You can use the WHERE clause to link more than two data tables. The linking column for table1 and table2 does not have to be the same column for table2 and table3. Execute the following statement and review your results against those in Figure 9.13.





SELECT Titles.PubID,Titles.Title,Publishers.Name,Authors.Author



   FROM Titles, Publishers,Authors



   WHERE Titles.PubID = Publishers.PubID



      AND Titles.Au_ID = Authors.Au_ID

Figure 9.13

Using the WHERE clause to link three tables.

In the previous example, the Publishers table and the Titles table are linked using the PubID column. The Titles table and the Authors table are linked using the Au_ID field. When the link is done, the selected columns are displayed in the result set.

You might have noticed that SQL assigns column names to the result sets. There are times when these assigned names can be misleading or incomplete. You can use the AS keyword to rename the columns in the result set. The following SQL statement is one example of using the AS keyword in the SELECT statement to rename the column headers of the result set. This renaming does not affect the original column names in the underlying tables. Execute the following SQL statement and compare your results to those in Figure 9.14.





SELECT Titles.PubID AS PubCode,    Titles.Title AS BookTitle,



   Publishers.Name AS PubName,



   Authors.Author AS AuthorName



   FROM Titles, Publishers,Authors



   WHERE Titles.PubID = Publishers.PubID



      AND Titles.Au_ID = Authors.Au_ID

Figure 9.14

Using the AS keyword to rename columns in the result set.

Now that you know how to use the SELECT...FROM statement to select the desired rows and columns from data tables, read about how to use SQL functions to calculate and manipulate data within your selected columns and rows.

SQL Aggregate Functions

The SQL standards define a core set of functions that are present in all SQL-compliant systems. These functions are known as aggregate functions. Aggregate functions are used to quickly return computed results of numeric data stored in a column. The SQL aggregate functions available through the Microsoft Access JET database engine are as follows:

The following SQL statement illustrates all five of the SQL aggregate functions. Enter and execute this statement, and check your results against Figure 9.15.





SELECT COUNT(Units) AS UnitCount,



   AVG(Units) AS UnitAvg,



   SUM(Units) AS UnitSum,



   MIN(Units) AS UnitMin,



   MAX(Units) AS UnitMax



   FROM BookSales

Figure 9.15

Using SQL aggregate functions.

You can use the WHERE clause and aggregate functions in the same SELECT...FROM statement. The following statement shows how you can use the WHERE clause to limit the rows included in the aggregate calculation. Refer to Figure 9.16 for results. Compare these numbers to the ones in the view returned in the previous query (Figure 9.15).





SELECT COUNT(Units) AS UnitCount,



   AVG(Units) AS UnitAvg,



   SUM(Units) AS UnitSum,



   MIN(Units) AS UnitMin,



   MAX(Units) AS UnitMax



   FROM BookSales



   WHERE Qtr = 1

Figure 9.16

Using the WHERE clause to limit the scope of aggregate functions.

Using Visual Basic Functions in a SELECT Statement

When you call the Microsoft Access JET database engine from within a Visual Basic program, you can use any valid Visual Basic functions as part of the SQL statement. For example, if you want to create a result set with a column that holds only the first three characters of a field in the underlying table, you could use the Visual Basic Left$ function as part of your column list in the SELECT...FROM statement, in the following line (see Figure 9.17).





SELECT Left$(Author,3), Author



   FROM Authors

Figure 9.17

Using Visual Basic functions in an SQL statement.

You can also use Visual Basic syntax to combine several data table columns into a single column in the result set. Enter and execute the following example and compare your results to Figure 9.18.





SELECT Name, City+", "+State+"  "+Zip AS ADDRESS



   FROM Publishers

Figure 9.18

Using Visual Basic syntax to combine columns.

You can also use Visual Basic functions as part of the WHERE clause in an SQL statement. The following example (see Figure 9.19) will only return rows that have the letter a as the second character in the Name column.





SELECT Name FROM Publishers



   WHERE Mid$(Name,2,1)="a"

Figure 9.19

Using Visual Basic functions in a SQL WHERE clause.

Even though using familiar Visual Basic functions and syntax is very handy, it has its drawbacks. Chief among them is the fact that after you create a SQL statement that uses VB-specific portions, your code is no longer portable. If you ever move the SQL statements to another database engine (such as SQL Server), you must remove the VB-specific portions of the SQL statements and replace them with something else that will work with the database engine you are using. This will not be an issue if you plan to stick with the Microsoft Access JET engine for all your database access.

Another possible drawback that you'll encounter if you use VB-specific syntax in your SQL statements is that of speed. Extensive use of VB-specific code within SQL statements will result in a slight performance hit. The speed difference is minor, but it should be considered.

It is better to use as few VB-specific functions in your SQL statements as possible. You will not limit the portability of your code, and you will not suffer from unduly slow processing of the SQL statements.


NOTE:

You can't use user-defined functions within your SQL statements when you use the Microsoft Access JET database engine from within Visual Basic. You can only use the built-in SQL functions and the predefined Visual Basic functions.


More SQL DML Statements

Now that you know how to create basic SQL SELECT...FROM statements and you know how to use the built-in SQL functions, return to the basic SELECT...FROM statement and add a few more enhancements to your SQL tool kit.

The DISTINCT and DISTINCTROW Clauses

There are times when you select data from a table that has more than one occurrence of the rows you are trying to collect. For example, you want to get a list of all the customers that have at least one order on file in the Orders table. The problem is that some customers have several orders in the table. You don't want to see their name appear more than once in your result set. You can use the DISTINCT keyword to make sure that you do not get duplicates of the same customer in your result set.

Enter and execute the following statement. As a test, execute the same SQL statement without the DISTINCT clause and compare the result sets. Refer to Figure 9.20 as an example.





SELECT DISTINCT Au_ID FROM Titles



   ORDER BY Au_ID

Figure 9.20

Using the DISTINCT keyword to remove duplicates from a result set.

If you include more than one column in the SELECT list, all columns will be used to evaluate the uniqueness of the row. Execute and compare the result sets of the following two SQL statements. Refer to Figure 9.21 as a guide.





SELECT DISTINCT Title



   FROM BookSales



SELECT DISTINCT Title, Units



   FROM BookSales

Figure 9.21

Using DISTINCT on multiple columns.

Notice that the first SQL statement returns a single record for each Title in the data table. The second SQL statement returns more records for each Title because there are distinct Units values for each Title.

There are also times when you want to collect data on all rows that are distinct in any of the fields. Instead of using the DISTINCT keyword and listing all the fields in the table, you can use the DISTINCTROW keyword. The SQL following statement (see Figure 9.22) uses DISTINCTROW to return the same records as the SQL statement in the previous example.





SELECT DISTINCTROW *



   FROM BookSales



   ORDER BY Title

Figure 9.22

Using DISTINCTROW in an SQL statement.

Both the DISTINCT and DISTINCTROW keywords enable you to limit the contents of the result set based on the uniqueness of one or more columns in the data table. In the next section, you'll learn how you can limit the contents of the result set to the records with the highest numeric values in selected columns.

The TOP n and TOP n PERCENT Clauses

You can use the TOP n or TOP n PERCENT SQL keywords to limit the number of records in your result set. Suppose you want to get a list of the five top-selling books in a data table. You can use the TOP n clause to get just that. TOP n returns the first n number of records. If you have two records of the same value, SQL will return both records. For the previous example, if the fifth and sixth record were both equal, the result set would contain six records, not just five.

When you use the TOP clause, you must also use the ORDER BY clause to make sure that your result set is sorted. If you do not use the ORDER BY clause, you will receive an arbitrary set of records because SQL will first execute the ORDER BY clause and then select the TOP n records you requested. Without the ORDER BY clause, it is quite likely that you will not get the results you intended. If a WHERE clause is present, SQL will perform the WHERE clause, the ORDER BY clause, and then the TOP n clause. As you can see, failure to use the ORDER BY clause will most certainly return garbage in your result set (see Figure 9.23).





SELECT TOP 5 * FROM BookSales



   ORDER BY Sales DESC

Figure 9.23

Using TOP n to limit the result set.

Notice that the previous example uses the DESC keyword in the ORDER BY clause. Whether you use the DESC or ASC ORDER BY format, the result set will still contain the first n records in the table (based on the sort). Also note that the result set contains more than five records, because several records have the same Sales value.

The TOP n PERCENT version returns not the top 5 records, but the top 5 percent of the records in the underlying data table. The results of the following SQL statement (see Figure 9.24) contain several more records than the result set shown previously.





SELECT TOP 5 PERCENT * FROM BookSales



   ORDER BY Sales

Figure 9.24

Using TOP n PERCENT to limit the result set.

The GROUP BY...HAVING Clause

One of the more powerful SQL clauses is the GROUP BY...HAVING clause. This clause lets you use the SQL aggregate functions discussed earlier today to easily create result sets that contain a list of subtotals of the underlying data table. For example, you might want to be able to create a data set that contains a list of Titles and the total Units sold, by Title. The following SQL statement (see Figure 9.25) can do that:





SELECT Title, SUM(Units) AS UnitsSold



   FROM BookSales



   GROUP BY Title

Figure 9.25

Using GROUP BY to create subtotals.

The GROUP BY clause requires that all numeric columns in the SELECT column list be a part of a SQL aggregate function (SUM, AVG, MIN, MAX, and COUNT). Also, you cannot use the * as part of the SELECT column list when you use the GROUP BY clause.

What if you wanted to get a list of all the book titles that sold more than 100 units for the year? The first thought would be to use a WHERE clause:





SELECT Titles, SUM(Units) AS UnitsSold



   WHERE Sum(Units) > 100



   GROUP BY Units

However, if you try to run this SQL statement, you discover that SQL does not allow aggregate functions within the WHERE clause. You really want to use a WHERE clause after the aggregate function has created a resulting column. In plain English, the query needs to perform the following steps:

Luckily, you don't have to actually write all this in a series of SQL statements. You can get the same results by adding the HAVING keyword to the GROUP BY clause. The HAVING clause acts the same as the WHERE clause, except that the HAVING clause acts upon the resulting columns created by the GROUP BY clause, not the underlying columns. The SQL following statement (see Figure 9.26) will return only the Titles that have sold more than 100 units in the last year:





SELECT Title, SUM(Units) AS UnitsSold



   FROM BookSales



   GROUP BY Title HAVING SUM(Units)>100

Figure 9.26

Using the HAVING clause with GROUP BY.

The columns used in the HAVING clause do not have to be the same columns listed in the SELECT clause. The contents of the HAVING clause follow the same rules as those for the contents of the WHERE clause. You can use logical operators AND, OR, and NOT, and you can include VB-specific functions as part of the HAVING clause. The following SQL statement (see Figure 9.27) returns sales in dollars for all titles that have more than 100 units sold and whose titles have the letter a as the second letter in the title:





SELECT Title, SUM(Sales) AS SalesAmt



   FROM BookSales



   GROUP BY Title



   HAVING SUM(Units)>100 AND Mid$(Title,2,1)="a"

Figure 9.27

Using a complex HAVING clause.

SQL JOINs

The JOIN clause is a very powerful optional SQL clause. Remember when you learned how to link two tables together using the WHERE table1.column1 = table2.column1 line? The only problem with using the WHERE clause is that the result set is not updatable. What if you need to create an updateable result set that contains columns from more than one table? You use JOIN.

There are three types of JOIN clauses in Microsoft Access JET SQL:

The follow sections describe each form of JOIN and how each is used in your programs.

The INNER JOIN

The INNER JOIN can be used to create a result set that contains only those records that have an exact match in both tables. Enter and execute the following SQL statement (see Figure 9.28):





SELECT [Publisher Comments].Comments,



   Publishers.Name, Publishers.State



   FROM [Publisher Comments] INNER JOIN Publishers



   ON [Publisher Comments].PubID = Publishers.PubID

NOTE:

The preceding SQL statement introduces the use of square brackets ([]). These are used to enclose a Microsoft Access data table name that contains embedded spaces. The square brackets are not part of standard SQL and are only there to handle the Microsoft Access table name. This is a good time to point out that it is a bad idea to use embedded spaces as table names!


Figure 9.28

Using the INNER JOIN SQL clause.

The previous SQL statement returns all the records from the Publisher table that have a PubID that matches a PubID in the [Publisher Comments] table. This type of join returns all the records that reside within both tables—thus, an INNER JOIN.

This is handy if you have two tables that you know are not perfectly matched against a single column and you want to create a result set that contains only those rows that match on both sides. The INNER JOIN also works well when you have a parent table (such as a CustomerTable) and a child table (such as a ShipAddressTable) with a one-to-one relationship. Using an INNER JOIN, you can quickly create a list of all CustomerTable records that have a corresponding ShipAddressTable record on file.

INNER JOINS work best when you create a JOIN on a column that is unique in both tables. If you use a table that has more than one occurrence of the JOIN column, you'll get a row for each occurrence in the result set. This might be an undesirable result. The following example illustrates the point (see Figure 9.29):





SELECT Titles.Title,BookSales.Units



   FROM Titles INNER JOIN BookSales



   ON Titles.Title = BookSales.Title

Figure 9.29

Using an INNER JOIN on a non-unique column.

In the previous example, the table BookSales has four occurrences of Title (one for each quarter of the year), so the result of the INNER JOIN returns each Title four times.

The LEFT JOIN

The LEFT JOIN is one of the two outer joins in the SQL syntax. Although INNER JOIN returns only those rows that have corresponding values in both tables, the outer joins return all the records from one side of the join, whether or not there is a corresponding match on the other side of the join. The LEFT JOIN clause returns all the records from the first table on the list (the leftmost table) and any records on the right side of the table that have a matching column value. The following example (see Figure 9.30) shows the same SQL query that was shown in Figure 9.28.





SELECT Publishers.Name,[Publisher Comments].Comments



   FROM Publishers LEFT JOIN [Publisher Comments]



   ON Publishers.PubID = [Publisher Comments].PubID

Figure 9.30

Using the LEFT JOIN clause.

Notice that the result set has blank comments in several places. The LEFT JOIN is handy when you want a list of all the records in the master table and any records in the dependent table that are on file.

The RIGHT JOIN

The RIGHT JOIN works the same as the LEFT JOIN except that the result set is based on the second (right-hand' table in the JOIN statement. You can use the RIGHT JOIN in the same manner you would use the LEFT JOIN.

UNION Queries

Another powerful SQL clause is the UNION clause. This SQL keyword lets you create a union between two tables or SQL queries that contain similar, but unrelated, data. A UNION query is handy when you want to collate information from two queries into a single result set. Because UNION queries return non-updateable result sets, they are good for producing on-screen displays, reports, and base data for generating graphs and charts.

For example, if you have a customer table and a vendor table, you might want to get a list of all vendors and customers who live in the state of Ohio. You could write an SQL statement to select the rows from the Customers table. Then write an SQL statement to select the rows from the Vendors table. Combine the two SQL statements into a single SQL phrase using the UNION keyword. Now you can get a single result set that contains the results of both queries.

In the following SQL statement (see Figure 9.31), you are creating a result set that contains all Publishers and Buyers that are located in the state of New York.





SELECT * FROM Publishers WHERE State='NY'



   UNION



SELECT * FROM Buyers WHERE State='NY'



   ORDER BY Zip

Figure 9.31

An example of a UNION query.

Notice that in the previous example, the Publishers.PubID column is not present even though the column contents are in the result set. The contents of Publishers.PubID has been inserted into the Buyers.BuyerID column. SQL had to do a data type override to accomplish this. The UNION query will use the column names of the first SQL query in the statement and will create a result set that displays the data even if data types must be altered to do so.

Each portion of the UNION query must have the same number of columns. If the first query results in six displayable columns, the query on the other side of the UNION statement must also result in six columns. If there is not an equal number of columns on each side of a UNION query, you will receive an SQL error message.

You can also use UNION queries on the same table. The following SQL statement (see Figure 9.32) shows how you can use SQL to return the top-selling titles and the bottom-selling titles in the same result set:





SELECT SUM(Sales) AS TotSales,Title FROM BookSales



   GROUP BY Title HAVING SUM(Sales)>4000



UNION



   SELECT SUM(Sales) AS TotSales,Title FROM BookSales



   GROUP BY Title HAVING SUM(Sales)<1000



ORDER BY TotSales

Figure 9.32

Using UNION on the same data table.

You can use Visual Basic stored queries (QueryDefs) as replacements for the complete SQL statement on either side of a UNION keyword. You can also link several SQL queries together with successive UNION keywords.

Crosstab Queries with TRANSFORM...PIVOT

The last SQL statement covered today is the TRANSFORM...PIVOT statement. This is a very powerful SQL tool that enables you to create result sets that contain summarized data in a form known as a crosstab query. Instead of trying to explain a crosstab query, I'll use a sample problem.

Suppose you have a data table that contains information on book titles and sales by quarter (sound familiar?). You have been asked to produce a view set that lists each book title down the left side and each quarter across the top with the sales figures for each quarter to the right of the book title. The only problem is that your data table has a single record for each quarter for each book. For example, if book A has sales in three quarters this year, you have three rows in your data table. If book B has sales for four quarters, you have four rows, and so on. How can you produce a view that lists the quarters as columns instead of rows?

You can accomplish this with a complicated set of subsequent SQL statements that produces temporary views, merges them together, and so on. Thanks to the folks who invented the Microsoft Access JET database engine, however, you can use the TRANSFORM...PIVOT statement instead. You can produce the entire result set in one SQL statement using TRANSFORM...PIVOT . The following SQL statement shows how this can be done. See Figure 9.33 for a sample result set.





TRANSFORM SUM(BookSales.Sales)



   SELECT Title FROM BookSales



   GROUP BY Title



PIVOT BookSales.Qtr

Figure 9.33

The TRANSFORM...PIVOT example.

Notice the form of the TRANSFORM...PIVOT statement. It starts with the TRANSFORM keyword, not the SELECT keyword. Also notice that a single SQL aggregate function immediately follows the TRANSFORM keyword. This is required, even if no real totaling will be performed. After the TRANSFORM aggregate function clause, you have the standard SELECT...FROM clause. Notice that the previous example did not include the Booksales.Sales column in the SELECT statement because it will be produced by the TRANSFORM...PIVOT clause automatically. The GROUP BY clause is required in order to tell SQL how to treat the successive rows that will be handled for each BookSales.Title. Finally, add the PIVOT keyword, followed by the column that you want to use, as the set of headers that follow out to the right of the GROUP BY column.

TRANSFORM...PIVOT uses the data in the PIVOT column as column headers in the result set. You will have as many columns in your result set as you have unique values in your PIVOT column. This is important to understand. Using columns that contain a limited set of data (such as months of the year) will produce valuable result sets. However, using a column that contains unique data (such as the CustomerID column) will produce a result set with an unpredictable number of columns.

The nice thing about TRANSFORM...PIVOT is that it is easy to produce several different views of the same data by just changing the PIVOT column. For example, what if you wanted to see the book sales results by BookSales.SaleRep instead of by BookSales.Qtr? All you have to do is change the PIVOT field. See the following code example and Figure 9.34.





TRANSFORM SUM(BookSales.Sales)



   SELECT Title FROM BookSales



   GROUP BY Title



PIVOT BookSales.SalesRep

Figure 9.34

Changing the PIVOT field

Notice, in Figure 9.34, that you can see a column with the header <>. When Microsoft Access JET ran the SQL statement, it discovered some records that had no value in the BookSales.SaleRep column. SQL automatically created a new column (<>) to hold these records and make sure they were not left out of the result set.

Even though TRANSFORM...PIVOT is a powerful SQL tool, there is one drawback to its widespread use in your programs. The TRANSFORM...PIVOT clause is not an ANSI-SQL clause. Microsoft added this clause as an extension of the ANSI-SQL command set. If you use it in your programs, you will not be able to port your SQL statements to other back-end databases that do not support the TRANSFORM...PIVOT SQL clause. Despite this drawback, you will find TRANSFORM...PIVOT a very valuable SQL tool when it comes to producing result sets for summary reports, data graphs, and charts.

Summary

Today you learned how to create basic SQL statements that select data from existing tables. You learned that the most fundamental form of the SQL statement is the SELECT...FROM clause. This clause is used to select one or more columns from a table and display the results of that statement in a result set, or view.

You also learned about the optional clauses that you can add to the SELECT...FROM clause:

You also learned about additional SQL keywords that you can use to control the contents of the result set:

Quiz

  1. What does SQL stand for? How is SQL pronounced?
  2. What SQL statement enables you to select data from table fields?
  3. What wildcard character do you use in a SELECT...FROM statement to include all fields of a table in your result?
  4. What clause do you use in a SQL statement to sort the displayed data?
  5. Identify two functions that a WHERE clause performs in a SQL statement?
  6. How do you rename the column headings in a SQL statement?
  7. What are SQL aggregate functions? List the SQL aggregate functions available through the Microsoft Access JET database engine.
  8. What are the drawbacks of using Visual Basic functions in SQL statements?
  9. What is the difference between the DISTINCT and DISTINCTROW SQL clauses?
  10. What clause should you always use with the TOP n or TOP n PERCENT clause?
  11. What are the three join types available in Microsoft JET SQL? Briefly explain how each is used.
  12. When would you use a UNION query?

Exercises

As a corporate MIS staff member, you are given the task of assisting the Accounting Department in extracting data from their accounts payable and accounts receivable systems. As part of your analysis, you determine that these systems possess the following data tables and fields:

Use this information to answer the questions that follow:

  1. Write a SQL statement to list all of the customers. Include their IDs, names, addresses, phone numbers, and customer types.
  2. Display all of the information in the Open Invoice table, but display CustomerID as Account.
  3. Display the same information requested in Exercise 2, but sort the data by customer and then by invoice number within each customer.
  4. Display all suppliers that can be found within New York City. Display their IDs, names, addresses, and phone numbers.
  5. Display the Customer types, names, and address for all customers with a customer type of ABC.
  6. Select and display customer IDs and names whose names begin with AME.
  7. Display the CustomerID and Name of all customers that have an open invoice. Sort your information by CustomerID.
  8. Select and display the five largest outstanding invoices.
  9. Display a listing of names and phone numbers of all customers and vendors that resid in Ohio.