SQL Server Gems

Friday, December 29, 2006

Adaptive Query Processing

To understand adaptive query processing, let us first take a look at how a DBMS works.

In a nutshell, the SQL statement that you write first gets passed to a parser, which checks for the semantics and generates a logical query plan. This plan is then passed to the query optimizer which finds a good query execution plan (consisting of physical query operators). The query execution plan (QEP) is then passed to the query engine which then executes the queries. Most of the operators in a DBMS are iterator-styled (i.e. consists of common operations like open(), close(), and getNextTuple()). This ensures that the output of an operator can be pipelined into another operator.

This works fine in a conventional DBMS since the data is usually arriving at a fairly reasonable rate. Hence, the "good" QEP determined earlier during the query optimization phase (which relies on statistics) stays the best way to execute the query.

Comparatively, in the early days, some commercial and prototype DBMSes made use of the notion of "competitive" plans (i.e. several QEPs gets executed). After some time, QEPs that do not perform well are then killed or discarded. One of the drawbacks of this approach is that it wastes precious resources (memory, CPU, etc).

The question is: Is having an optimal QEP prior to query execution really necessary for good response time? Can we do without a QEP?

Most folks will get apalled by the statement. The reason is because that tuning the QEP has been directly linked to deriving good performance. However, let's consider the following scenario: What if the query environment condition changes? What if the data sources do not provide data at a steady rates?

For those who have taken a fundamental parallel query procesisng course, the term LINDA might sound familiar. The key idea of LINDA is the concept of a space in which processes put or take data into/out of the space. By doing do, it gives rise to the ability for multiple "operators" to be performing some tasks in a parallel manner. Can we do the same for a DBMS, and how can we maintain the state of the physical operators?

Hence, the need for adaptive query processing (i.e. let the "query plan" adapt to the query environment). Prior to the notion of adaptive query processing, there has been a lot of prior work on query scrambling (i.e. run-time modification of the QEP). Most of these work were done in the context of data integration systems.

An interesting development arises from the recently popular topic on DSMS (data stream management system). Eddies: Continuously Adaptive Query Processing was introduced as a means for adaptively changing the query plan. A lot of subsequent work resulted from that.

What I am interested in - is whether such adaptive query processing strategies can be used in SQL Server? I would certainly like to explore this when I complete my current journey...

0 Comments:

Post a Comment

<< Home