其他统计分析


方差、标准差、协方差

参考 1参考 2

Return Type Name(Signature) Description
DOUBLE variance(col), var_pop(col) Returns the variance of a numeric column in the group.
DOUBLE var_samp(col) Returns the unbiased sample variance of a numeric column in the group.
DOUBLE stddev_pop(col), std(col), stddev(col) Returns the standard deviation of a numeric column in the group.
DOUBLE stddev_samp(col) Returns the unbiased sample standard deviation of a numeric column in the group.
DOUBLE covar_pop(col1, col2) Returns the population covariance of a pair of numeric columns in the group.
DOUBLE covar_samp(col1, col2) Returns the sample covariance of a pair of a numeric columns in the group.

其他

Return Type Name(Signature) Description
DOUBLE corr(col1, col2) Returns the Pearson coefficient of correlation of a pair of a numeric columns in the group.
double regr_avgx(independent, dependent) Equivalent to avg(dependent). As of Hive 2.2.0.
double regr_avgy(independent, dependent) Equivalent to avg(independent). As of Hive 2.2.0.
double regr_count(independent, dependent) Returns the number of non-null pairs used to fit the linear regression line. As of Hive 2.2.0.
double regr_intercept(independent, dependent) Returns the y-intercept of the linear regression line, i.e. the value of b in the equation dependent = a * independent + b. As of Hive 2.2.0.
double regr_r2(independent, dependent) Returns the coefficient of determination for the regression. As of Hive 2.2.0.
double regr_slope(independent, dependent) Returns the slope of the linear regression line, i.e. the value of a in the equation dependent = a * independent + b. As of Hive 2.2.0.
double regr_sxx(independent, dependent) Equivalent to regr_count(independent, dependent) * var_pop(dependent). As of Hive 2.2.0.
double regr_sxy(independent, dependent) Equivalent to regr_count(independent, dependent) * covar_pop(independent, dependent). As of Hive 2.2.0.
double regr_syy(independent, dependent) Equivalent to regr_count(independent, dependent) * var_pop(independent). As of Hive 2.2.0.