mysqlDBApply             package:RMySQL             R Documentation

_A_p_p_l_y _R/_S-_P_l_u_s _f_u_n_c_t_i_o_n_s _t_o _r_e_m_o_t_e _g_r_o_u_p_s _o_f _D_B_M_S _r_o_w_s (_e_x_p_e_r_i_m_e_n_t_a_l)

_D_e_s_c_r_i_p_t_i_o_n:

     Applies R/S-Plus functions to groups of remote DBMS rows without
     bringing an entire result set all at once.  The result set is
     expected to be sorted by the grouping field.

_U_s_a_g_e:

     mysqlDBApply(res, INDEX, FUN, 
        begin, group.begin, new.record, end,
        batchSize = 100, maxBatch = 1e5, ..., 
        simplify = FALSE)

_A_r_g_u_m_e_n_t_s:

     res: a result set (see `dbSendQuery').

   INDEX: a character or integer specifying the field name or field
          number that defines the various groups.

     FUN: a function to be invoked upon identifying the last row from
          every group. This function will be passed a data frame
          holding the records of the current group,  a character string
          with the group label, plus any other arguments passed to
          `dbApply' as `"..."'.

   begin: a function of no arguments to be invoked just prior to 
          retrieve the first row from the result set.

     end: a function of no arguments to be invoked just after
          retrieving  the last row from the result set.

group.begin: a function of one argument (the group label) to be 
          invoked upon identifying a row from a new group

new.record: a function to be invoked as each individual record is
          fetched.  The first argument to this function is a one-row
          data.frame holding the new record.

batchSize: the default number of rows to bring from the remote  result
          set. If needed, this is automatically extended to hold groups
          bigger than `batchSize'.

maxBatch: the absolute maximum of rows per group that may be extracted
          from the result set.

     ...: any additional arguments to be passed to `FUN'.

simplify: Not yet implemented

_D_e_t_a_i_l_s:

     `dbApply'  This function is meant to handle somewhat gracefully(?)
     large amounts  of data from the DBMS by bringing into R manageable
     chunks (about  `batchSize' records at a time, but not more than
     `maxBatch');  the idea is that the data from individual groups can
     be handled by R, but not all the groups at the same time.  

     The MySQL implementation `mysqlDBApply' allows us to register R 
     functions that get invoked when certain fetching events occur.
     These include the ``begin'' event (no records have been yet
     fetched), ``begin.group'' (the record just  fetched belongs to a
     new group), ``new record'' (every fetched record generates this
     event), ``group.end'' (the record just fetched was the last row of
     the current group), ``end'' (the very last record from the result
     set). Awk and perl programmers will find this paradigm very
     familiar (although SAP's ABAP language is closer to what we're
     doing).

_V_a_l_u_e:

     A list with as many elements as there were groups in the result
     set.

_N_o_t_e:

     This is an experimental version implemented only in R (there are
     plans, time permitting, to implement it in S-Plus).

     The terminology that we're using is closer to SQL than R.  In R
     what we're referring to ``groups'' are the individual levels of a
     factor (grouping field in our terminology).

_S_e_e _A_l_s_o:

     `MySQL', `dbSendQuery', `fetch'.

_E_x_a_m_p_l_e_s:

     ## compute quanitiles for each network agent
     con <- dbConnect(MySQL(), group="vitalAnalysis")
     res <- dbSendQuery(con, 
                  "select Agent, ip_addr, DATA from pseudo_data order by Agent")
     out <- dbApply(res, INDEX = "Agent", 
             FUN = function(x, grp) quantile(x$DATA, names=FALSE))

