•  
Results 1 to 4 of 4

Thread: Slow single indexed column max query

  1. #1
    Join Date
    Sep 2011
    Posts
    2

    Default Slow single indexed column max query

    Hi,

    I am trying to determine why the following query is taking so long to execute. I would think since it is an indexed column it would be a simple b-tree index scan ( O(log n) time). Any suggestions or hints on how to speed this up?

    Code:
    bi=# select max(time) from table_co;
        max     
    ------------
     1284136823
    (1 row)
    
    Time: 108,846.394 ms
    The query plan is:

    Code:
     Aggregate  (cost=8562899.88..8562899.89 rows=1 width=4)
       ->  Gather Motion 4:1  (slice1; segments: 4)  (cost=8562899.80..8562899.86 rows=1 width=4)
             ->  Aggregate  (cost=8562899.80..8562899.81 rows=1 width=4)
                   ->  Append-only Columnar Scan on table_co  (cost=0.00..7328002.84 rows=123489696 width=4)
    The table ddl is:
    Code:
    CREATE TABLE table_co (
      trans_id int,
      account_id int,
      type varchar(20),
      cp varchar(40),
      b varchar(20),
      q varchar(20),
      u int,
      time int,
      p decimal(30,10),
      b2 decimal(30,10),
      f1 decimal(30,10),
      f2 decimal(30,10),
      f3 decimal(30,10),
      f4 decimal(30,10),
      f5 decimal(30,10),
      f6 decimal(30,10),
      f7 decimal(30,10),
      f8 varchar(10),
      f9 int,
      f10 int,
      f11 decimal(30,10),
      f12 decimal(30,10),
      f13 int,
      f14 decimal(30,10),
      f15 decimal(30,10),
      f16 decimal(30,10),
      f17 int
    )
    WITH (appendonly=true, orientation=column)
    DISTRIBUTED BY (trans_id);
    The indexes:
    Code:
    CREATE INDEX table_trans_id_co 
    	ON table_co (trans_id);
    CREATE INDEX table_accountid_time_type_co 
    	ON table_co (account_id,time,type);
    CREATE INDEX table_time_type_co 
    	ON table_co (time,type);
    CREATE INDEX table_link_co 
    	ON table_co (transaction_link);
    CREATE INDEX table_cp_time_co 
    	ON table_co (cp,time);
    The total number of records in the table is: 493,958,798

    The server environment
    • CentOS
    • 500GB xfs filesystem
    • 4 cores
    • 3GB memory


    The db configuration
    • 4 data segments ( 1 per core )
    • Default server configuration parameters

  2. #2

    Default

    That optimisation hasn't been implemented on Greenplum. No particular hard reason to that, just not high enough up the list of things to do.

    Try SELECT time FROM table_co ORDER BY time DESC LIMIT 1;

  3. #3
    Join Date
    Sep 2011
    Posts
    2

    Default

    It appears as though the query time is actually slower by ~22s with the limit:

    Execution time:

    Code:
    bi=# SELECT time FROM table_co ORDER BY time DESC LIMIT 1;
        time    
    ------------
     1284136823
    (1 row)
    
    Time: 129,754.615 ms

    Query plan:

    Code:
    bi=# explain SELECT time FROM table_co ORDER BY time DESC LIMIT 1;                                                       QUERY PLAN                                                       
    ------------------------------------------------------------------------------------------------------------------------
     Limit  (cost=103497868.39..103497868.42 rows=1 width=4)
       ->  Gather Motion 4:1  (slice1; segments: 4)  (cost=103497868.39..103497868.42 rows=1 width=4)
             Merge Key: "time"
             ->  Limit  (cost=103497868.39..103497868.40 rows=1 width=4)
                   ->  Sort  (cost=103497868.39..104732765.35 rows=123489696 width=4)
                         Sort Key (Limit): "time"
                         ->  Append-only Columnar Scan on table_co  (cost=0.00..7328002.84 rows=123489696 width=4)
    One of our needs is to figure out the current state of the table prior to daily processing. This type of query is typical of many of our processes. Is there a plan to add this optimization?

  4. #4
    Join Date
    Feb 2011
    Posts
    4

    Default

    It seems index on append-only row/column table doesn't handle orderby efficiently. If you can afford, plain heap table is a recommended workaround to optimize with LIMIT.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •