Compatibility Issues Between DataFlex Databases and

Databases Accessed Through CLI Based Connectivity Kits Version 2.1

A Data Access Worldwide White Paper
by Ben Weijers

August 2002
Last Edited: September 23, 2002

Overview
This document describes compatibility issues between using the DataFlex database or a database accessed through one of the CLI based Connectivity Kits.

First of all we need to define what exactly a compatibility issue is. For the purpose of this document we define a compatibility issue as:

A situation where a certain technique used in a DataFlex program gets a different result when using the DataFlex database than when using a database accessed through CLI Connectivity Kits. Result, in this definition, is the end result and does not include the time needed to get at the end result.

Next to the definition above, we would like to introduce performance issues. For the purpose of this document we define a performance issue as:

A situation where a certain technique used in DataFlex is (much) faster when using the DataFlex database than when using a database accessed through CLI Connectivity Kits.

Note that performance is only an issue when CLI is slower than DataFlex. In the cases where it is faster, we do not speak of a performance issue. As opposed to compatibility issues, the performance issues are not bi-directional. This is because speed is rated to be a positive thing. Making things faster is always greeted as an improvement. Making things slower usually is only accepted if it also introduces some benefit that weighs more than the disadvantage of losing speed.

We will present two lists of issues, compatibility and performance issues, as defined above. Each individual issue needs to be prioritized on its own merits. It is not the case that compatibility issues always get a higher priority than performance issues. Complete compatibility on a non-performing driver is not acceptable, as is great incompatibility on a fast driver.

The issues discussed in this document may or may not apply in a given environment. Some issues only apply when connecting to existing data, others always apply. The following sections give a short overview of what may or may not apply to a given environment. The ODBC Connectivity Kit is not listed in the following sections because when using ODBC it all depends on the back end.  

Converting from DataFlex to SQL Server

When converting from DataFlex to SQL Server, the following issues can occur:

Compatibility

·           Overlap and Underlap Support

·           Case Sensitive Index Segments

·           Locking

·           File Modes

·           Zero Dates

Performance

·           Ascending/Descending Index Segments only on pre SQL server 2000 versions.

·           Structure Changes

Attaching to Existing SQL Server Data

When attaching to an existing SQL Server table the following issues can occur:

Compatibility

·           Unique Identifier

·           Unique versus Non-unique Indexes

·           Locking

·           File Modes

Performance

·           Null Sorting

·           Zerofile

·           Structure Changes

Converting from DataFlex to DB2

When converting from DataFlex to DB2, the following issues can occur:

Compatibility

·           Overlap and Underlap Support

·           Case Sensitive Index Segments

·           Locking

·           File Modes

·           Zero Dates

Performance

·           Zerofile

·           Structure Changes

Attaching to Existing DB2 Data

When attaching to an existing DB2 table the following issues can occur:

Compatibility

·           NULL Collating only when NULL values are allowed.

·           Unique Identifier

·           Unique versus Non-unique Indexes

·           Locking

·           File Modes

Performance

·           Null Sorting only when NULL values are allowed.

·           Zerofile

·           Structure Changes

^top

Compatibility Issues

This chapter discusses compatibility issues between the DataFlex database and a database accessed through a CLI based Connectivity Kit.

NULL Collating

When data is converted from DataFlex to an SQL database, Nullable columns (columns that will accept a null value) are, by default, never supported/created. We suggest that you do not change this default. If you do support Nullable columns and those columns are used in indexes you will need to be aware of collating and performance issues. We suggest that you avoid using any Nullable columns and we strongly advise you not to use Nullable columns that are part of indexes.

For a proper understanding, a few things should be clear. First of all, DataFlex does not implement a null concept! Secondly, when we say that a column has the null value we mean that we do not know what the value of the column is. Thirdly, null values greatly complicate logical expressions. The expression outcome changes from a binary value (true, false) to a three-valued value (true, false, unknown). Next to this, there are special syntactical rules for handling null values. Lastly, ordering a column that allows null values is slower than ordering the same column that does not allow null values. In other words, columns that allow nulls in indexes will make finding slower.

Database systems use collating sequences to determine how to sort the data in alphanumeric columns. In a collating sequence, one specifies what order characters have relative to each other. Again, null values are special in this regard. You cannot specify how null values should collate. This is a given behavior with following possible implementations:

·           SQL_NC_END = NULLs are sorted at the end of the result set, regardless of the ASC or DESC keywords.

·           SQL_NC_HIGH = NULLs are sorted at the high end of the result set, depending on the ASC or DESC keywords.

·           SQL_NC_LOW = NULLs are sorted at the low end of the result set, depending on the ASC or DESC keywords.

·           SQL_NC_START = NULLs are sorted at the start of the result set, regardless of the ASC or DESC keywords.

The null collation is SQL_NC_HIGH for DB2, SQL_NC_LOW for SQL Server. For ODBC, it depends on the back end.

Let us look at a practical sample for a database that uses SQL_NC_HIGH. Assume a table containing invoices. This table contains a number and a paid date. These two columns also form an index. Now (only looking at these two columns) lets assume the following DataFlex data (ordered by the index in question):

 

Number

Paid date

1

0

1

01/01/1999

2

10/10/2000

2

10/11/2000

3

0

3

05/05/2001

 

If we convert this with null values allowed, it would result in the following converted data (also ordered by the index in question).

 

Number

Paid date

1

01/01/1999

1

0

2

10/10/2000

2

10/11/2000

3

05/05/2001

3

0

 

Now lets look at the find command below and the results in DataFlex and the CLI back end, respectively:

 

Command

DataFlex result

CLI result

Clear X

Move 2 To Number

Find Lt X By Index.1

1, 01/01/1999

2, 10/11/2000

 

The difference in the above results is technically correct. From a technical point of view, there is no problem. We sort by a certain number of columns and we return the records in that order. However, since there is a difference in result, existing applications will not always work correctly when the data is converted to back ends that use SQL_NC_HIGH, SQL_NC_END or SQL_NC_START and null values are allowed. 

^top

Unique Identifier

Any API based driver will require some unique numeric identifier, the identity, for a row in a table. In CLI databases such an identifier is not present.

The solution we have chosen is to add an identity column to a table definition when the table is converted. The identity column is added at the end of the already existing columns.

The numeric identifier combined with the addition of the identifier column introduces three incompatibilities.

1.    It is not possible to connect to tables that do not have a unique numeric identifier.

2.    It is not possible to convert tables that use recnum relations. The chosen mechanism will not keep those relations intact.

3.    In a mixed environment where the same system runs on DataFlex and some CLI based back end it is not possible to have code that adds columns to an existing table definition result in same table definition. The identifier field will get in the way, thus changing field numbers. Field numbers are the base for working with columns inside a DataFlex application.

^top

Unique versus Non-unique Indexes

DataFlex only supports unique indexes. Most CLI back ends also support non-unique indexes. If a CLI table has such an index defined, it is possible that finding may give in some unexpected result. Both SQL Server and DB2 support non-unique indexes.

Note that this problem does not occur when using standard converted tables since indexes are forced to be unique during conversion. The problem could occur when converting with the use of non-unique index option, adding an index through back end utilities or by connecting to existing data.

When new indexes are defined or the definition of an existing index is changed, one can create non-unique indexes. The Connectivity Kit assumes an index should be non unique when it contains RECNUM as one of its segments. When such an index is created on the back end, the RECNUM segment is stripped from the index definition and the non-unique keyword is used in the create index statement.

In DataFlex one uses the RECNUM segment to create an index that "allows duplicates" (Name, Recnum). In a CLI based environment one would use the record identifier to achieve the same thing (Name, DFRecnum).

^top

Table Definition

Depending on the back end, the rules for legal table and column construction vary. A table whose definition is perfectly legal in DataFlex cannot be converted to the back end if the resulting table will not have a legal construction for that back end.

This issue only applies to ODBC. SQL Server and DB2 both have bigger limits than DataFlex.

A special category in table definition is reserved words and table, column name limits. There is no problem in using reserved words as table or column names. We are currently not aware of any table or column name that cannot be created in SQL Server or DB2.

In ODBC, however, it all depends on the database. If that database allows the so-called quoted identifiers, reserved words can be used as table or column names. If the database does not, there is a problem in converting DataFlex tables with a reserved word used for table or column names.

In SQL Server and DB2, there are no column name length issues. The name length may be an issue in ODBC. dBase, for example, allows column names to be of 8 positions. The CLI based Connectivity Kits will force column names to be unique within a table. So, when converting to a dBase table with column names “CREDITORADDRESS” and ”CREDITORCITY” would result in a dBase table with physical column names “CREDITOR” and “CREDITO1”. The names will be “uniqyfied” much like Windows does for short directory names. The table definition as DataFlex presents it will still contain the long names since these are stored in the intermediate file.

It is possible that connecting to existing CLI based tables results in tables that are illegal in DataFlex! One example is a Paradox table that used “File_Number” as a column name. In such cases the illegal columns can be renamed in the intermediate file.

^top

Overlap and Underlap Support

CLI back ends do not support overlaps at all. This is only a problem if the overlap does not completely span a number of fields.

Overlaps that completely span fields are supported by the CKs. You can use them in indexes. The sort order, however, is different. In DataFlex an overlap sorts based on binary comparison. In CLI we simulate the overlap and have to create an index made up of the underlying fields. This will be sorted on whatever collating logic is used for the type of the segments.

Overlaps that do not completely span fields (underlaps) are only supported if they are not used in indexes. Underlaps will be converted into overlap fields (the one that completely spans fields) at conversion time. The user will have to edit the intermediate file to get underlap fields. Every Strucuture_end operation on the table that includes a rewrite of the intermediate file will also convert underlaps to overlaps. The user again will have to edit the intermediate file to get underlaps. 

^top

Case Sensitive Index Segments

CLI back ends do not support case sensitive segments.

The index is created with the same segments as in DataFlex. The case sensitive setting for index segments is completely ignored. If a programmer wants to enforce a rule on a column to avoid duplicate column data no matter in what casing the data is in, he should look at the check constrain support of the back end. Another option would be to create a trigger that checks for the existence of a record with the same uppercased value and generates an error if it exists, thus rolling back the transaction. Triggers should be defined on update and insert in that case.

^top

Locking

Locking is a completely different animal in both environments. The complete lack of any form of lock function or command in the CLI API definition demonstrates this difference. This may sound more serious than it actually is.

The lack of a lock function does not mean there is no locking. Every multi-user environment - database, operating system or other - supports some form of locking to guarantee correct results.

Most CLI based back ends support record locking. The bigger database environment use so-called escalation scheme’s where record locks can escalate into page locks and eventually table locks. The threshold for escalating the lock granularity usually is a percentage. So, if more than X percent of the records on a page is locked, lock the page. If more than Y percent of the pages in a table is locked, lock the table. Both SQL Server and DB2 support record locking.

Some CLI back ends support so-called optimistic locking. All locking fundamentals used in DataFlex require so-called pessimistic locking. Most back ends that support optimistic locking offer the possibility to change to pessimistic locking. If the back end only supports optimistic locking, it cannot be used. Both SQL server and DB2 use pessimistic locking.

On a technical DataFlex level there are differences:

For more information, see http://www.dataaccess.com/whitepapers/TransactionsLockingandDataFlex.htm.

^top

File Modes

The DF_FILE_MODE attribute is supported in the sense that you can set and get it. Setting it will not have any effect. All this does is to store the value so when you get it at a later time it still is what you have set it to.

CLI does not support a concept that comes close to a file, so it also does not support file modes. In an SQL environment, one uses tables as the base for statements. The tables themselves cannot be set to read-only or read-write other than by setting up privileges (rights) for certain users. In order to be able to work with data in environments like DataFlex cursors are used. Cursors do have modes like read-only and such but they are conceptually different from file modes. All cursors created outside a transaction are set to read-only, all cursors created inside a transaction to “lockable”.

^top

Zero Dates

DataFlex uses a so-called zero date when a date column has not been given a value.  CLI based back ends do not support anything like a zero date. When a column’s value is unknown, CLI back ends use NULL values. DataFlex does not support anything that comes close to NULL values.  Zero dates are widely used in existing DataFlex environments.

In order to handle zero dates, a mechanism called dummy zero date was introduced. When a table does not use the dummy zero date logic the following applies:

Note that we recommend not to use NULL values. See also the compatibility issue NULL Collating and the performance issue Null Sorting.

The dummy zero date uses a given date value and interprets it as if it were the DataFlex zero date. In DB2 that value is “01/01/0001”, for SQL Server it is “01/01/1753”, for ODBC it is set to “01/01/0001” by default but it can be adjusted. When a table uses the dummy zero date logic the following applies:

When using non-DataFlex tools or Embedded SQL to query the database you should be aware of these settings of a particular table to interpret the data in a correct way. 

^top

 Performance Issues

This chapter describes performance issues.

In general you should be aware that there are a lot of things that influence performance. Performance is influenced by the hardware on which the database server runs, the number of clients accessing the database, the speed of the connection between the database client and server and much more.

Null Sorting

When data is converted from DataFlex to an SQL database, Nullable columns (columns that will accept a null value) are, by default, never supported/created. We suggest that you do not change this default. If you do support Nullable columns and those columns are used in indexes, you will need to be aware of collating and performance issues. We suggest that you avoid using any Nullable columns and we strongly advise you not to use Nullable columns that are part of indexes.

When an index contains one or more columns that allow null values and you are finding in a way that null values are part of the result set, finding is slow. The same find operation on the same data but with index segments that do not allow null values will be much faster.

^top

Finding in General

Find speed on CLI back ends greatly depends on how “optimizable” the resulting SQL statements and result sets are. We are constantly looking into even better optimization logic and improvements in this area.

You should be aware that when comparing find speed between DataFlex and CLI based databases there are no absolutes. Sometimes a certain operation will be faster when using DataFlex data, sometimes it will be faster when using CLI based data.

^top

Ascending/Descending Index Segments

Not all CLI back ends support descending segments in indexes. This is not a problem logically because we do not need indexes to sort by certain columns. In the case a certain index cannot be created in the back end, the Connectivity Kit falls back and defines it in the intermediate file only. Logically, there is no difference between the DataFlex and CLI based table.

Physically, there is a difference because one index does exist in DataFlex and not in CLI. This will make finding on that index slower.

DB2 supports descending index segments in all of its Universal Database versions. Microsoft SQL Server started to support descending index segments in SQL Server 2000, earlier versions do not support descending index segments. For ODBC it depends on the back end that is used.

^top

Zerofile

To implement zerofile, all rows from a table are deleted by issuing a “Delete from X” statement (without a where clause). This can be quite slow especially if the transaction is logged, triggers actions or otherwise involves extra database activity.

In SQL Server we use a special SQL construction “Truncate Table X” that is much faster because it will not be logged or trigger other database activity.

^top

Structure Changes

The Structure_End logic speed can vary depending on the changes made. If the changes made require a recreate of the table, we will:

If only indexes were changed, the appropriate indexes are dropped and the new indexes are created. Any non-touched indexes will not be dropped!

If only attributes were changed that are stored in the intermediate file, only the intermediate file is recreated.

The three Structure_End scenarios are in descending order for the time they take. The longest operation is first, the shortest last.

^top

Saving and Updating in General

Insert and update in CLI back ends tend to be a bit slower than in a DataFlex environment. Most environments where we tested this also implement transaction logging with roll back and roll forward support. DataFlex does not support transaction logging other than via Novell TTS.

You should be aware that when comparing update and create speed between DataFlex and CLI based databases there are no absolutes. Sometimes a certain operation will be faster when using DataFlex data, sometimes it will be faster when using CLI based data.


^top

Data Access Worldwide
14000 SW 119 Ave
Miami, FL 33186
305-238-0012
Domestic Sales: 800-451-3539
Fax: 305-238-0017
email: sales@dataaccess.com
Newsgroup Server: news.dataaccess.com
Internet: http://www.dataaccess.com

Data Access Worldwide - Asia Pacific
Suite 5, 333 Wantirna Road, Wantirna VIC 3152 Australia
Phone: +61 3 9800 4233 f: +61 3 9800 4255
Sales: asiapacific@DataAccess.com
Support: support.asiapacific@DataAccess.com
Internet: http://www.DataAccess.com/AsiaPacific

Data Access Worldwide - Brasil
Av.Paulista, 1776 - 21st.Floor
São Paulo -SP - Brazil
CEP 01310-921
Phone: 5511-3262-2000
Fax 5511-3284-1579
Sales: info@dataaccess.com.br
Support: suporte@dataaccess.com.br
Internet: http://www.dataaccess.com.br

Data Access Worldwide - Europe
Lansinkesweg 4
7553 AE Hengelo
The Netherlands
Telephone: +31 (0)74 - 255 56 09
Fax: +31 (0)74 - 250 34 66
Sales: info@dataaccess.nl
Support: support@dataaccess.nl
Internet: http://www.dataaccess.nl

Data Access Technical Support
800-451-3539 / 305-232-3142
email: support@dataaccess.com
Visit our Support Home page to see all of our Support options: http://www.dataaccess.com/support

Data Access Technical Knowledge Base    http://www.dataaccess.com/kbase
Many answers to technical problems can be found online in the Data Access Technical Knowledge Base. Here, you can access the same live data that Data Access Worldwide technical support and development staff use to enter and track technical articles. 

Copyright Notice
This document is property of Data Access Corporation. With credit to Data Access Corporation for its authorship, you are encouraged to reproduce this information in any format either on paper or electronically, in whole or in part. You may publish this paper as a stand alone document within your own publications conditional on the maintenance of the intent, context, and integrity of the material as it is presented here.

DataFlex is a registered trademark of Data Access Corporation.

NO LIABILITY FOR CONSEQUENTIAL DAMAGES
To the maximum extent permitted by applicable law, in no event shall Data Access Corporation be liable for any special, incidental, indirect, or consequential damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or any other pecuniary loss) arising out of the use of or inability to use any information provided in this document, even if Data Access Corporation has been advised of the possibility of such damages. Because some states and jurisdictions do not allow the exclusion or limitation of liability for consequential or incidental damages, the above limitation may not apply to you.