Overview of Query Retargeting¶
The use of pre-computed aggregate data is a technique to address performance challenges in Data Warehouse systems (DW). An aggregate retargeting mechanism redirects a query to an available aggregate table(s) when possible.
Performance is a critical issue in DW systems. Pre-computed aggregation is the most effective strategy to reduce the time needed to access large numbers of rows.
Aggregates usage has to be transparent and non-intrusive. The following transparency aspects are addressed by Querona:
- aggregate unawareness - data consumer is not aware that pre-aggregate was used during query execution
- execution engine unawareness - data consumer is not aware what execution engine hosts pre-aggregate used
- complexity unawareness - data consumer is not aware of pre-aggregate structure and how many pre-aggregates were used, if any
The adoption of aggregates should be completely transparent to DW users for two main reasons:
- users typically do not have the necessary technical skills to make the appropriate choices on how to conveniently use aggregates to improve performance,
- as decision requirements change constantly, aggregates may have to change accordingly (e.g. removal of obsolete aggregates).
An aggregate retargeting mechanism provides this transparency, by redirecting a query, if possible, to the available aggregate table(s) that can answer the query.
Objectives of Query Retargeting¶
Objectives are as follows:
- improve the performance of Data Warehouse and Data Marts
- improve architecture flexibility
- allow for a delay of optimization decisions until the performance requirements are fully known
- allow for ad-hoc optimizations for quick reaction to performance issues in production
How Query Retargeting works¶
An aggregate, also known as a summary table, is a materialized view that stores redundant, consolidated data. In Querona we prefer to use the term “target”, because a materialized view does not have to be an aggregate. In fact, any materialized view can be used as target: for example, a wide table can be reduced in a view (using projection) to a few columns that are needed most. Such target can then be used to substitute a wide table in queries issued by users.
The adoption of aggregates requires the resolution of the following:
- finding and defining the best set of aggregates
- efficiently maintaining the aggregates when new data is shipped to the data warehouse
- efficiently computing these aggregate views from multiple sources
- exploring the aggregates to answer queries.
The current version of Querona leaves the finding, defining and maintaining aggregates to the user, usually a skilled data warehousing specialist. In essence, it is a user responsibility to define views that will be considered by the query rewrite engine as targets. For now, a semi-automated or fully-automated pre-aggregate candidate proposition is not supported.
Query retargeting in action¶
To illustrate the problem at hand, let us consider the following example. Consider tables below:
Suppose that a critical query is to report monthly sales by product:
The query can be formulated like this:
1 2 3 4 5 6 7 8
SELECT Month , ProdName , Sum(AmountSold) as MonthlySales FROM FactSales JOIN DimProducts ON FactSales.Prod_id = DimProducts.Prod_id GROUP BY Month, ProdName
Assume that the FactSales table is very large and expected query response time is below one second.
To use query retargeting we have to:
- define a view - target view name or virtual database it belongs to do not matter
- define caching of a view - either caching mode will do
- mark view as ‘Query rewrite target’
- materialize the view by rebuilding the cache
For example, we have to define the following view:
1 2 3 4 5 6 7 8 9 10
CREATE VIEW v_summary_by_month AS SELECT Month , ProdName , Sum(AmountSold) as MonthlySales FROM FactSales JOIN DimProducts ON FactSales.Prod_id = DimProducts.Prod_id GROUP BY Month, ProdName
To make it work we also need to enable the caching of the view
v_summary_by_month and rebuild the cache (materialize the view).
When a user executes the query:
1 2 3 4 5 6
SELECT Month, ProdName, Sum(AmountSold) as Sales FROM FactSales JOIN DimProducts ON FactSales.Prod_id = DimProducts.Prod_id GROUP BY Month, ProdName
Querona will rewrite the query and retarget it to a materialized view with ready-made pre-aggregate. The resulting query would be similar to:
SELECT Month, ProdName, MonthlySales as Sales FROM v_summary_by_month
Response time will be much better thanks to utilization of the already prepared results.
More examples of supported variants of the query:
1 2 3 4 5 6 7
SELECT Month, ProdName, Sum(AmountSold) as Sales FROM FactSales JOIN DimProducts ON FactSales.Prod_id = DimProducts.Prod_id WHERE Month BETWEEN (200101 and 200102) GROUP BY Month, ProdName
1 2 3 4 5
SELECT Sum(AmountSold) FROM FactSales JOIN DimProducts ON FactSales.Prod_id = DimProducts.Prod_id WHERE Month >= 200101