•  
Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: segment failure periodic

  1. #1
    Join Date
    Jun 2011
    Posts
    8

    Default segment failure periodic

    Hi ,

    I am running GP 4.1 on centos 64bit in a 4 node cluster, one primary and 3 segment hosts with 6 primary/6 mirror segs. Since this is experimental, all the hosts are simply VMs carved out of a host and SAN luns mounted to them. While running an especially large/complex sql, I noticed that the performance degraded. And on further checking, it showed that some of the segments had failed. GP repurposed some mirror segments to play the role of primary.
    Are there any stringent requirement for data sync that could cause such a behavior in a sub-optimal/experimental setup?
    I checked the logs, but didn't see anything on "why" the segments would have failed.
    The gprecovseg goes thr' fine and recovers the segments and the gprecovseg -r rebalances them as well. It happened twice during heavy SQL activity.

    Any insight is much appreciated.

    thanks

    ameet

  2. #2
    Join Date
    Jan 2011
    Posts
    253

    Default

    Could it be, that because of the performance degree (everything is just virtual) the segments got timeouts and switched over to the mirrors?

  3. #3
    Join Date
    Jun 2011
    Posts
    8

    Default

    I noticed the issue after running a massive update statement. after that every time I touched that table, I would get into this segment failure situation. trying to run a vacuum gave me this error:
    VACUUM <tablename>;
    WARNING: Greenplum Database detected segment failure(s), system is reconnected
    WARNING: Greenplum Database detected segment failure(s), system is reconnected
    ERROR: No primary gang allocated (cdbgang.c:1635)

  4. #4
    Join Date
    Jan 2011
    Posts
    253

    Default

    Would you say that your vmware can handle 18 parallel I/O requests in a performant way?
    Because that's what you are doing - 3 nodes a 6 segments.
    Might be that you just run into timeouts.

    Can you test the query with a smaller setup, maybe even single node?

  5. #5
    Join Date
    Jan 2011
    Posts
    253

    Default

    Would you say that your vmware can handle 18 parallel I/O requests in a performant way?
    Because that's what you are doing - 3 nodes a 6 segments.
    Might be that you just run into timeouts.

    Can you test the query with a smaller setup, maybe even single node?

  6. #6
    Join Date
    Jun 2011
    Posts
    8

    Default

    thanks, it's possible that the nodes were facing memory starvation. after correcting for that, the error has subsided. Just strange that the db did not account for that in a graceful manner.

  7. #7
    Join Date
    May 2011
    Location
    Knoxville, TN
    Posts
    34

    Default

    Quote Originally Posted by ads View Post
    Would you say that your vmware can handle 18 parallel I/O requests in a performant way?
    Because that's what you are doing - 3 nodes a 6 segments.
    Might be that you just run into timeouts.

    Can you test the query with a smaller setup, maybe even single node?
    Is there a way to alter/extend or even monitor the timeouts on the segments?

    We are running into the similar issue and it seems that because of the network traffic throttling by virtual machines
    during high loads we run into timeouts?

  8. #8
    Join Date
    Jan 2011
    Posts
    253

    Default

    Can you monitor the switches, the throughput thru the switches?
    What about the network I/O on the virtual machines?

  9. #9
    Join Date
    May 2011
    Location
    Knoxville, TN
    Posts
    34

    Default

    Quote Originally Posted by ads View Post
    Can you monitor the switches, the throughput thru the switches?
    What about the network I/O on the virtual machines?

    What is the way to monitor for interconnect/node-to-node timeouts on Greenplum?

  10. #10
    Join Date
    Jan 2011
    Posts
    253

    Default

    You can check the logfiles on each segment.
    Also you can check for long-standing TCP sessions and/or icmp error messages on the network.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •