Fujii Masao
masao****@gmail*****
2015年 9月 25日 (金) 19:04:02 JST
On Tue, Sep 22, 2015 at 12:55 AM, Masahiko Sawada <sawad****@gmail*****> wrote: > On Fri, Sep 18, 2015 at 1:00 AM, Fujii Masao <masao****@gmail*****> wrote: >> On Thu, Aug 27, 2015 at 1:44 AM, Fujii Masao <masao****@gmail*****> wrote: >>> On Wed, Aug 26, 2015 at 11:48 PM, Masahiko Sawada <sawad****@gmail*****> wrote: >>>> On Wed, Aug 26, 2015 at 11:06 AM, Fujii Masao <masao****@gmail*****> wrote: >>>>> Hi, >>>>> >>>>> Attached patch implements the pg_gin_pending_cleanup function which cleans up >>>>> the pending list of the specified GIN index by moving tuples in it to the main >>>>> GIN data structure in bulk. Then this function returns the number of pages in >>>>> the pending list cleaned up. I'd like to add this function into the master. >>>>> >>>>> Even without this function, we can clean up the pending list by using VACUUM. >>>>> However, since VACUUM needs to do not only the pending list cleanup but also >>>>> other various jobs, it usually takes a long time and its performance impact is >>>>> likely to be big. So I think that pg_gin_pending_cleanup function is useful >>>>> because we can clean up the list more quickly and avoid such big performance >>>>> impact by using the function. >>>> >>>> +1. >>>> It will be really useful function for maintenance GIN index. >>>> I applied this patch to HEAD cleanly, and compiled without warning. >>>> It looks good to me. >>> >>> Thanks for reviewing the patch! Applied the patch to the master. >> >> On second thought, current version of pg_gin_pending_cleanup might not be >> sufficient for real scenario because it moves the tuples from pending list into >> GIN index main structure but doesn't mark the removed pages as free in FSM. >> So even if pg_gin_pending_cleanup function is called many times, garbage pages >> in pending list will never be freed and reused later. This causes GIN index to >> be kept being bloated unexpectedly :( >> >> For that problem, I think that we should provide not only tuple-moving but also >> mark-as-free functionalities. > > +1. > >> One question here is; how should we provide those >> functionalities? There are basically three options. >> >> #1. Provide two separate functions, (1) tuple-move and (2) mark-as-free. >> The demerit of this option is that a user needs to call both functions >> when he or she wants to move tuples from pending list and mark removed >> pages as free in FSM. >> >> #2. Provide three separate functions, >> (1) tuple-move, (2) mark-as-free and (3) tuple-move + mark-as-free >> But we might want to avoid providing three functions here... >> >> #3. Provide one function and enable them to specify the operation that they >> want to perform as an argument. For example, if a user specifies "free" >> as argument, the function does only mark-as-free operation. If "both" is >> specified, both tuple-move and mark-as-free are performed. Of course, >> the argument value "move" makes the function perform tuple-move. >> Maybe the default should be "both". > > I think that the function just moving tuple(i.g. (1) function) would > be useful for testing GIN and pg_bigm on 9.4 or before. > And (3) function will be helpful certainly in production environment. > But I'm not sure that using the function just marking FSM as free > (i.g, (2) function) would help for something. Okay. > Also #3 seems to be overkill. > > So IMO, we should add (1) and (3) functions. But I'd like to avoid providing very similar two different functions. So I feel inclined to add something like pg_gin_pending_cleanup(index regclass, update_fsm boolean default true) If only index name is given as an argument (or update_fsm is true), this function cleans up the list and adds the deleted pages to FSM. If update_fsm is false, this function just moves the tuples from the list to GIN index. What about adding the above only one function? Regards, -- Fujii Masao