0 like 0 dislike
28 views

Please log in or register to answer this question.

1 Answer

0 like 0 dislike
answered by (4k points)  

Data is stored based on selected field (s) which are used for distribution. When you have a Distribution Key by Hash the values of the Distribution Key are run through a Hash Formula. Then, a map is used to distribute the row to the correct segment. The formula is designed to be consistent so that all like values go to the same segment.

==Data(A) => Hash Function(B) => Logical Segment list(C) => Physical Segment list(D) => Storage(E).

When data arrives at the Greenplum, it is hashed based on field(s) and a hash function (B) is used for this purpose.

For example, Consider 4 node system, logical segment list has 4 unique entries. If there are 10 hashed data items from (B), there are 10 entries in (C), then all having only 4 segment entries. For example (C) has values [4,1,2,3,4,3,1,4,3,2]. Then, a map is used to distribute the row to the correct segment. The formula is designed to be consistent so that all like values go to the same segment.

Related questions

0 like 0 dislike
1 answer
...