16 December 2001 Douglas Clark is a program management consultant who first started using OS/2 version 1.3. He's married, with 2 girls, and is old enough to remember when 4 color mainframe terminals were a big thing. If you have a comment about the content of this article, please feel free to vent in the OS/2 eZine discussion forums. There is also a Printer Friendly version of this page. |
|
ODBC on OS/2 Part 9
The ODBC series up to now has been oriented towards the general user, that is people that use ODBC applications. Section 1 (parts 1 & 2) described what ODBC is, how it works and how to install it on OS/2. Section 2 (parts 3 - 8) covered all the ODBC database drivers and associated database servers or clients available for OS/2. With this last section (which will probably span parts 9-12) the orientation of the articles changes more to programmers rather than general users. In this section we will discuss what ODBC is and how it works from a programmer's perspective, and issues related to writing ODBC enabled applications. What we will not be covering is how to write ODBC database drivers, which is a very different programming problem than writing applications that use ODBC to access data. This section won't be a tutorial on programming ODBC, but rather an introduction to ODBC and how it fits into OS/2.
| |||
The issues faced by programmers writing ODBC applications broadly fall into two areas:
The first issue - understanding how ODBC works (programmatically) and how to write interoperable ODBC applications - will be covered in this article, part 9. Issues specific to different languages will be covered in the following months. The plan is:
It may seem strange to be talking about ODBC programming in languages other than C, since the ODBC API is actually a C language API. But other languages have built interfaces to ODBC, interfaces that range from providing most of the power of the ODBC API, all the way down to just providing the ability to pass SQL statements and get results. The list above is not all-inclusive: for instance IBM released a rather short lived language called Visual Basic which was an attempt to create a version of BASIC with objects which had an ODBC interface. In order to better understand what ODBC is and how ODBC works it helps to place the ODBC API in the context of other APIs, so we will look at embedded SQL programming and generic CLI programming before moving on to ODBC. What is SQL?SQL is a language used to query and manipulate databases. It was designed to simplify and standardize how a programmer could get at stored data, and to provide a way for data users to query their data. SQL and relational databases grew up together and depend on each other. Both are required to solve the problems that up to then existed, namely that the users of the data could not write the programs needed to access the data. The reason they couldn't was that data was stored in all sorts of specialized structures that each had their own unique access methods and programming techniques. The system programmers that created the data storage structures/files also created the programs that retrieved the data. The users of the data had to rely on the system programmers to write new queries for accessing data each time the users' needs changed. The goal of SQL was to replace the need for custom data retrieval programs with a standardized language that was general enough to handle all ad-hoc data access requirements.Since the goal was to create a language that simplified and standardized how to access data, the language was designed to be non-procedural. A procedural language is one where the desired result is defined by a number of ordered steps - the program. A non-procedural language is where the desired result is described by a single language statement. The thought was that many users of the data (lawyers, accountants, etc.) would not be capable of writing program loops in order to retrieve data. So the non-procedural nature of SQL allowed the user to ask a single question, albeit a sometimes complex question, in the form of a single SQL statement. The non-procedural aspects of SQL pose some problems which were recognized early on by database vendors. There are some problems that are either impossible or very difficult to do with only SQL, which become possible or much easier within a procedural context. Therefore some database vendors created procedural "dialects" of SQL. For example Sybase has Transact-SQL, Oracle has PL/SQL and Informix has SPL; each of which has normal flow-control devices (IF-THEN, loops, etc.) But even with procedural constructs added to SQL the language cannot stand on its own; it has to be employed within or by another programming language. SQL has no output facilities, no cout , printf() or say statements. The data retrieved from the database by SQL has to be transferred to another language in order to be sent to the printer, viewed on the screen, or to be otherwise generally useful. How Does SQL Interface with a Programming LanguageIf SQL is the language used to talk to the database, how does a programming language send SQL statements to the database? And how does the data from the database get from SQL into the programming language? The answers to those two questions depend on the method used to interface SQL into a programming language.There are generally two methods of interfacing SQL in a programming language.
The embedded method is the oldest method and is the method that SQL was designed around. Using functions to pass SQL statements to the database is a later development. This type of interface is called the Call Level Interface (CLI) and has also been standardized in both the X/Open CAE Specification "Data Management: SQL Call-Level Interface (CLI)" and ISO/IEC 9075-3:1995 (E) Call-Level Interface (SQL/CLI). Writing programs is easier with the embedded method because it tracks closer to SQL than the function call method. However compiling programs is much easier with the function call/CLI method because the precompiler step is avoided, along with the expansion of the SQL statement(s) into multiple programming language statements and functions. Getting a programming language syntax error with the embedded method can be a nightmare to resolve because of all that extra stuff inserted into the program by the precompiler. ODBC is based on the CLI method but it adds some additional features, which we will see below. It helps to understand the added complexity of the CLI method by comparing it to embedded SQL programming. Before we can do that however we must stop for a moment and talk about the SQL SELECT statement. Although this is not a SQL tutorial I need to show you enough to be able to explain the embedded and CLI methods. The SELECT statement in SQL retrieves data from a table in a database. The form of the statement is: SELECT column_name,column _ name ,... FROM table_name WHERE condition Column_name is the name(s) of the columns from the table you want to retrieve. The WHERE clause is optional and allows for specifying a condition that limits the rows from the table that are retrieved. For example if we wanted to look at all names and addresses of all the democratic voters in the city of Littleton we could write a select statement that says select first_name,last_name,address from voters where city='LITTLETON' and party = `DEMOCRAT' This would retrieve about 5,856 rows from a table of 22,656 registered votes - those 5856 rows are called a "result set". So far so good. But how will we deal with 5856 rows of data? We really need to look at each individual row, or send each individual row to a printer. SQL has a construct called a cursor , which is a result set associated with a pointer. We can use the pointer to select single rows from the result set and get the data from the columns on that row. The way that is done is (and here SQL's non-procedural nature seems to change to something that sure seems procedural-like):
Each time the FETCH statement is executed the cursor pointer is moved to the next row and the data is copied to the host programming language variables that are specified. Traditionally the pointer can only move forward through the result set and can only move one row at a time. Newer databases add the ability to move the pointer either forwards or backwards and by some arbitrary number of rows in either direction. (ODBC has something called a cursor library that will provide the backward and arbitrary movement functions if the database doesn't support those features.) Now lets look at an example of the embedded method. Embedded SQLEmbedded SQL can be used in whatever languages the database vendor supports. Traditionally the languages used with embedded SQL are C, C++, COBOL, FORTRAN, and Java. On OS/2 both DB2 and Oracle support a Rexx interface that looks something like embedded SQL but is really a cross between embedded and CLI.The example below is a simple C program that opens a cursor requesting all the democratic voters from the city of Littleton, and prints out the results. This program would have to be run through a precompiler, supplied by the database vendor. The output from the precompiler would then be compiled.
EXEC SQL INCLUDE SQLCA;
With the embedded interface SQL integrates rather nicely with a programming language, in this example C. It is fairly straightforward to send SQL statements to the database and get the data from the database into the programming language (the host language) variables. The disadvantage is that the program will only run against the database it is precompiled for. This is because the precompiler is specific to a database; each database vendor uses their own precompiler that only works on their database. This also means that a single program cannot easily access more that one database; to access more than one vendor's database in a single program you would have to break the program into separate modules for each database, precompile and compile each module separately and then link the modules together into a single program. Call-Level Interface (Function Calls)This is an example of a CLI type program that does the same thing as the embedded program above. It retrieves and prints out the names and addresses of democratic voters registered in the city of Littleton. If this program were complete it could be compiled either as a DB2 CLI program or as an ODBC version 2.x program. As in the embedded example above all error checking has been left out for clarity.The sequence is somewhat more involved. The basic steps are:
/* handles */
Notes on the code.
The CLI program requires more code to accomplish the same thing. For example, in embedded programming we can simply specify which host variable we want to use in the FETCH statement; with CLI programming we have to call a function for each host variable which is used to define the name, type and size of the host variable - things that the precompiler figures out (type and size anyway) for us automatically. The trade-offs are:
ODBC & CLIWhen ODBC was created it was based on the emerging CLI standard, although it did not completely implement that preliminary standard. After the introduction of ODBC the CLI standard was modified and finalized into the standard that exists now. ODBC was then changed (in version 3) to match the CLI standard - it fully implements the CLI standard. If a program is written to the CLI standard it is guaranteed to work with ODBC version 3 if it is compiled with ODBC headers and is called by the ODBC Driver Manager. With DB2 the same is also true with ODBC version 2.x; with some #defines set you can compile an ODBC version 2.x program with the DB2 CLI headers and have it work with DB2.ODBC is also a superset of the Call-Level Interface, meaning that ODBC provides some additional functions and features that do not appear in the CLI standard. These additional features include: the Driver Manager, code-page translation libraries, functions that report the database driver's ANSI SQL conformance and support, escape sequences for various literal values and scalar functions, and standardized scalar functions. Each of these additional features is discussed below. Driver Manager(This section is mostly quoted from Microsoft's ODBC 3.0 Programmer's Reference and SDK Guide.)The Driver Manager is a library that manages communications between applications and drivers (meaning here CLI libraries). The Driver Manager exists mainly as a convenience to application writers and solves a number of problems common to all applications. These include determining which driver (CLI library) to load based on a data source name, loading and unloading drivers, and calling functions in drivers. To see why the latter is a problem, consider what would happen if the application called functions in the driver directly - which is what happens in normal CLI programming. Unless the application was linked directly to a particular driver, it would have to build a table of pointers to the functions in that driver and call those functions by pointer. Using the same code for more than one driver at a time would add yet another level of complexity. The application would first have to set a function pointer to point to the correct function in the correct driver, and then call the function through that pointer. The Driver Manager solves this problem by providing a single place to call each function. The application is linked to the Driver Manager, not the driver. The application identifies the target driver and builds a table of pointers to the functions in that driver. It uses the connection handle passed by the application to find the address of the function in the target driver and calls that function by address. For the most part the Driver Manager just passes function calls from the application to the correct driver. However it also implements some functions itself - mainly those that query the list of data source names and drivers installed - and performs basic error checking. For example, the Driver Manager checks that handles are not null pointers, that functions are called in the correct order, and that certain function arguments are valid. The final role of the Driver Manager is loading and unloading drivers. The application loads and unloads only the Driver Manager. When it wants to use a particular driver it calls a connection function in the Driver Manager and specifies a data source name (or a driver type). Using this name the Driver Manager searches the data source information (on OS/2 this is ODBC.INI) for the driver's file. It then loads the driver (assuming it is not already loaded), stores the address of each function in the driver, and calls the connection function in the driver, which then initializes itself and connects to the data source. When the application is done using the driver it calls a disconnect function in the Driver Manager which calls that function in the driver and then unloads the driver from memory. Code-Page Translation LibrariesODBC provides a facility where the user can specify a code-page translation library to be loaded and used for a specific driver. This library then translates characters from one code page to another - i.e. from the code page used by the database to the code page used on the client machine.Capability Reporting FunctionsDifferent databases can have dramatically different capabilities. These differences range from the data types supported by the database to the SQL capabilities of the database. Even though ODBC (or CLI) fixes the API differences between databases with a common standard, it does not force all the databases to behave the same way. Applications that are intended to run on more than one database still must take these differences into account. For example some databases that have a Call-Level Interface or ODBC database driver do not support outer joins, others that support outer joins use different syntax (e.g. Oracle). The application must be able to recognize what the capabilities are of the database it is talking to and adjust itself accordingly. ODBC provides some functions for querying a database for its capabilities to allow an application to work well with multiple database products.Escape SequencesOne of the biggest and most troubling differences between databases is how they handle dates and times. Some databases (e.g. Oracle, Sybase, etc.) include a time element with date values, some strictly separate dates and times into separate data types. Most have different syntax for specifying a date value. For example date values are specified as `MM-DD-YYYY' (`12-25-2001') in DB2 and as `DD-MMM-YYYY' in Oracle (`25-Dec-2001'). ODBC provides escape sequences for representing date and time values which are then translated by the database driver into whatever syntax the database requires. These escape sequences can be used when specifying a date (or time or timestamp) to make the date portable across different database products. (ODBC also always returns the date part of the value in the format - YYYY-MM-DD, which simplifies working with date values).Escape sequences are provided for date, time and timestamp values, scalar function calls, LIKE escape characters (a way of specifying _ or % as literal characters to be searched), outer joins and procedure calls. Date escape sequence is {d `YYYY-MM-DD'}, example {d `2001-12-25'} Time escape sequence is {t `hh:mm:ss'}, example {t `23:04:01'} Timestamp sequence is {ts `YYYY-MM-DD hh:mm:ss'}, example {ts `2001-12-25 23:04:01'} An example of a date escape sequence in a select statement is select first_name, last_name from voters where registered_date after {d `1999-01-01'} The escape sequence for ODBC defined scalar functions is {fn scalar-function }, for example {fn UCASE(NAME)}. The ODBC defined scalar functions are discussed below. The outer join escape sequence is {oj outer-join } where outer-join is: table-reference LEFT OUTER JOIN { table-reference | outer-join } ON search-condition for example select name, deptname from {oj employee left outer join dept on employee.deptid=dept.deptid} where employee.projnum = 544 Standardized Scalar FunctionsODBC defines a standard set of scalar functions that are to be provided by the database. Not all databases provide all the scalar functions, but when a database provides the capability of a scalar function the ODBC database driver will implement the function. The advantage of using the ODBC defined functions, along with the escape sequence, is that the syntax is standardized across databases. The driver is responsible for translating the ODBC function into a native database function, or implementing the function itself. Database drivers generally state which functions they support, and ODBC API functions can be used to query a driver for which scalar functions it supports.Scalar functions are functions that can be used in place of a column name in a select statement. I will just list the scalar functions here because the purpose and usage of most are so obvious explanation is not necessary.
ASCII(
string_exp
)
ODBC VersionsODBC has gone through three major versions with a number of minor releases for each major version. The last version on OS/2 is version 3. The first version that was available as a SDK (Software Developers Kit) was version 2. So in OS/2 we mostly do not have to worry about version 1.An application written to the ODBC API of a certain version requires a Driver Manager at that version or higher and a database driver at that version or higher. Meaning that an application written and compiled to version 2.x will work with a version 3 Driver Manager and database driver; but an application written and compiled to version 3 will not work with a version 2.x Driver Manager or database driver. The major differences between version 2.x and version 3.x are those needed to bring ODBC into compliance with the CLI standards, which mostly revolved around changing function names a number of #DEFINES changing from version 2 to version 3. Version 3 really doesn't add much in the way of additional capabilities. Since there are a number of ODBC version 2.x database drivers in OS/2 that have never been updated to version 3, it is probably best to write applications to the version 2.5 API. This way the application will be compatible with the widest range of database drivers. The one exception to this might be if you are trying to write code that is source compatible with both the ODBC and the CLI standard, so the same application could be used with or without ODBC (although there is really not much advantage to not using ODBC). To be source compatible with most database vendors that conform to the CLI standards, other than DB2, you would have to code your application to ODBC version 3. With DB2 you can write the application to ODBC version 2.5 and still have it compile as a DB2 CLI application. If you are using a language other than C or C++ you probably don't have a choice. The language will have its data access part written to a version of ODBC and that is what you are stuck with. But it won't matter to you anyway because the language will mask any differences between versions. Data TypesFor languages that have data types, programming around the different data types in different databases, and the mapping of a database's data types to the variable types of the programming language can be a major chore. The headaches are not only caused by the real differences in data types but also by the different names used by the different databases for the same type of data. The difficultly is slightly helped by ODBC having a finite set of datatypes, as shown below.
#define SQL_CHAR 1
ODBC functions that return the structure of a table and the characteristics of a result set, return both the database name for the column type and the ODBC type mapped to that column. This helps, but it is up to the driver how to map a particular database type to an ODBC type. For example some Oracle database drivers return SQL_DATE for an Oracle DATE column; others return the ODBC type as SQL_TIMESTAMP. So you end up having to know and program around the database vendor and driver vendor instead is blindly relying on the ODBC types to solve type mismatches. Some database data types have no equivalent in C. For example DB2 has a decimal type that doesn't really map well into any existing C type; the value loses precision when placed into C float and double types. When you map a host variable to a column in a result set, you specify the type of the host variable. The database driver is responsible for converting the data from the database type into the host variable type. Likewise if you are using input parameters the driver is responsible for converting from the host variable type to the database data type. (We haven't talked about input parameters, but the concepts are basically the same for input parameters as output parameters.) Making Applications InteroperableMaking applications interoperable means programming the application in such a way as to insure that it works with a wide variety of different databases. I will offer some general tips that are applicable no matter what language you are using, but there really is no substitute for testing the application on all the databases you intend it to run on.
Conclusion and SummarySQL needs to be used with a programming language to be useful. ODBC is one method of integrating SQL into a programming language. ODBC also provides some very valuable capabilities to help make database applications work against multiple databases, and it greatly simplifies writing applications that access different databases at the same time. Even with ODBC capabilities it takes careful attention to details to write an application that works across multiple databases. The truly excellent database application is the one that works well against databases that didn't exist when it was written.
This month we tried to show some of why ODBC is valuable from a programmer's perspective, some basic concepts behind ODBC and some features of ODBC that can and should be used regardless of the language used for writing an ODBC enabled database application. Next month we will discuss writing ODBC applications in C and C++, how to compile the application and where to get the headers and libraries needed. Our intent is not to provide a tutorial on ODBC programming, but rather to explain the mechanics of writing ODBC applications on OS/2. There are books that already exist on writing ODBC applications, and the ODBC reference manuals from Microsoft are actually pretty good themselves.
Footnotes
1 - Database Principles, Programming, and Performance page 26; by O`Neil and O`Neil ISBN: 1-55860-438-3.
|
|||||
|