![]() In Redshift, MVs are refreshed manually, using the REFRESH MATERIALIZED VIEWS statement. The performance of the MV, on the other hand, should not be affected by the size of the product reviews table, making the performance benefit of the MV even more evident as the size of the underlying table(s) grows. It’s important to note here that, as the data volume in the product reviews table grows, it’s reasonable to expect the query results returned by the standard view to take progressively longer. When running the same query against the MV, it returned results after about 100 milliseconds - a full 28x faster than the standard view. Running this in Looker against the database view took a little under three seconds, which is really impressive considering that the view joined and summarized data across 3 tables, the largest of which is almost 150 million rows. In the LookML for each of the views, I added a new measure for summing up the count of reviews in each grouping. From there, I set up a new Looker project and where I added in those two views. To do this, I did two things to ensure that I wasn’t working with cached data įirst, I disabled any session level caching for my Redshift user:Īlter user looker set enable_result_cache_for_session to offĪnd second, each time I visualized the data in Looker, I used the option to ‘clear cache and refresh.’Īfter running through the setup script supplied in the AWS blog, I created a view as well as a MV that aggregated nearly 150 million rows of data from the Amazon Product Reviews dataset. To begin this exercise, I first made sure that my timings were accurate. In this post, I’ll run the setup script supplied in the AWS blog post, use Looker to run the queries against a standard database view and a MV, and evaluate just how well a MV built on top of more than 100 million rows performs. Using that blog as inspiration, I thought it would be interesting to show how to make use of MVs with Looker using that same data set. The increased query performance that can be achieved with MVs is highlighted in this AWS, where the authors demonstrate how MVs can be used to speed up analytic queries by pre-aggregating data. The time it takes to refresh a MV, however, is typically measured in seconds or milliseconds, and it still provides support for all of the joins and complex aggregations. And because tables created using CTAS can include joins across large tables and complex aggregations, it may take several minutes for it to get recreated. MVs can be updated incrementally using the REFRESH MATERIALIZED VIEW command, whereas a table created using CTAS has to be completely recreated each time. While this still allows users to query pre-aggregated data, MVs have an important benefit over CTAS tables. ![]() Historically in Redshift, users have been able to store the results of a query to a physical table using the CREATE TABLE AS SELECT (CTAS) syntax. Amazon Redshift recently, providing a useful and valuable tool for data analysts, because they allow analysts to compute complex metrics at query time with data that has already been aggregated, which can drastically improve query performance. ![]() Materialized Views (MVs) allow data analysts to store the results of a query as though it were a physical table. The content is subject to limited support. ![]() This content, written by Bruce Sandell, was initially posted in Looker Blog on Jul 2, 2020.
0 Comments
Leave a Reply. |