•  
Results 1 to 7 of 7

Thread: Time partitions and subpartitions

  1. #1
    Join Date
    Mar 2011
    Posts
    9

    Default Time partitions and subpartitions

    I have a large (fact) table spanning several years of data. It is accessed mostly by querying particular time ranges, which rarely extend more than a few days.

    I would like to split it up into a partitioning structure that allows the optimizer to ignore the irrelevant TBs that are not in the time range, and doesn't require me to create a bunch of partitions manually (since by next year, I'll have forgotten how to add a partition to this table)

    I understand that I have to create a new table and insert into newtable select * from oldtable (sigh)

    In structuring the new table, I want to partition by the year, and subpartition by the week of the year, but I'm not clear on the syntax for setting that up.

    Code:
    data
      - data2000
           - data2000_wk1
           - data2000_wk2
    ...
      - data2037
           - data2037_week51
           - data2037_week52
           - data2037_week53
    Can I subpartition by extract(WEEK from time_stamp) in my subpartition definition? Or do I have to throw that into a database column in the new table?

    Thanks,
    --Joe

  2. #2

    Default

    Hi Joe,
    yes I think it is possible to subpartition by week.

    IIRC you should do something similar to:

    Code:
    CREATE TABLE t2 ( LIKE t1 )
    DISTRIBUTED BY ( x ) 
    PARTITION BY RANGE ( date_field )  
    SUBPARTITION BY RANGE ( date_field )  
    SUBPARTITION TEMPLATE
    (       
            START (date '...') EXCLUSIVE
            END   (date '...') EXCLUSIVE
            EVERY (INTERVAL '1 week'),
            DEFAULT SUBPARTITION other_weeks
    )(      
            START (date '...') EXCLUSIVE
            END   (date '...') EXCLUSIVE
            EVERY (INTERVAL '1 year'),
            DEFAULT PARTITION other_years
    );

    I am going from memory, I hope this help.

    Waiting for feedback.
    Cheers,
    Carlo Ascani - 2ndQuadrant Italy
    PostgreSQL Training, Services and Support
    carlo.ascani@2ndQuadrant.it | www.2ndQuadrant.it

  3. #3
    Join Date
    Mar 2011
    Posts
    9

    Default

    Yes, I agree that's what it should be...

    Code:
    CREATE TABLE jm ( time_stamp timestamp without time zone, data text )
    DISTRIBUTED RANDOMLY
    PARTITION BY RANGE(time_stamp)
    	SUBPARTITION BY RANGE(time_stamp) 
    	SUBPARTITION TEMPLATE 
    	( START (date '2010-01-01') END (date '2014-01-01') EVERY (interval '1 week') )
    ( START (date '2010-01-01') END (date '2014-01-01') EVERY (interval '1 year') )
    Unfortunately, when I do that for 4 years, the subpartition template gets applied to each of the 4 yearly partitions, so each of them gets 209 (=53*4) subpartitions, for a total of 841 tables created by this one command.

    The check criteria will ensure that nearly all of those subpartition tables will always be empty, but it's still a lot of tables.

  4. #4
    Join Date
    Jan 2011
    Posts
    253

    Default

    Do you really need to have weekly subpartitions for all 4 years? Maybe you can have weekly subpartitions for the first year and then go for monthly ones?
    Usually you don't need to access older data by week or day, a deep drilldown is not required. But I don't know your queries ...

  5. #5
    Join Date
    Mar 2011
    Posts
    9

    Default

    I'm really not attached to the weekly partitioning scheme, but...

    We query "recent" data much more often than older data. Generally, we look for data in a 2-3 day time window, and rarely go back more than 3 months ago for that window.

    New data is arriving into the table all the time, so there's a rolling window of what is "recent".

    I would be glad to use coarser partitioning for old data, but as far as I can tell, there's no way to rebalance where the data is once it's been placed into a partition. So the data relevant to 5/1/2012 - 5/7/2012 will go into a weekly partition and be queried now, but in August, it's still in that same partition table.

    Or is there an easy way to move data into a "more than 90 days old" partition set? i.e. insert into jm_partition_olddata (select * from jm_partition_jan1_7); delete from jm_partition_jan1_7;

    I can see a "cron"-based solution-- a script that every week creates a new "weekly" partition, removes the data from the table's default partition and re-inserts it (which trickles the data down into the newly created last-week table. But surely there's something more straightforward.

    Thanks for the help,
    --Joe

  6. #6
    Join Date
    Jan 2011
    Posts
    253

    Default

    This cron-based solution is the usual way to handle this. It's called rolling window.
    You copy data from several small partitions into one big one, then drop the old ones.

  7. #7

    Default

    Hi Joe,
    that task is common when you are managing a data warehouse.

    Maybe you would find an ETL tool like Kettle [1] or Talend Open Studio [2] (etc.) easier to use.
    That is a matter of tastes and needs, actually.

    The key point here is that you have to handle your old data "manually".

    [1] http://kettle.pentaho.com/
    [2] http://www.talend.com/products/open-studio-di.php
    Carlo Ascani - 2ndQuadrant Italy
    PostgreSQL Training, Services and Support
    carlo.ascani@2ndQuadrant.it | www.2ndQuadrant.it

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •