Title: | Distributed Multinomial Regression |
---|---|
Description: | Fast distributed/parallel estimation for multinomial logistic regression via Poisson factorization and the 'gamlr' package. For details see: Taddy (2015, AoAS), Distributed Multinomial Regression, <arXiv:1311.6139>. |
Authors: | Matt Taddy [aut], Nelson Rayl [cre] |
Maintainer: | Nelson Rayl <[email protected]> |
License: | GPL-3 |
Version: | 1.0.1 |
Built: | 2025-01-07 04:44:22 UTC |
Source: | https://github.com/taddylab/distrom |
Collapses counts along equal levels of binned covariates.
collapse(v,counts,mu=NULL,bins=NULL)
collapse(v,counts,mu=NULL,bins=NULL)
v |
Either matrix or Matrix of covariates (matches |
counts |
Either matrix or Matrix of multinomial counts, or a factor (matches |
mu |
Possible pre-specified fixed effects for |
bins |
The number of quantile bins into which we collapse |
For each column of v
, aggregates
the observations into bins
defined by their average value. Both v
and counts
are then collapsed according to levels of the interaction across implied bin-factors, and the number
of observations in each bin is recorded as n
. Look at the code of the dmr
function to see collapse
used in practice.
A list containing collapsed and formatted v
, counts
, and nbin
, along with mu = log(rowSums(counts))
, the plug-in fixed effect estimates for dmr.
Matt Taddy [email protected]
we8there
Gamma-lasso path estimation for a multinomial logistic regression factorized into independent Poisson log regressions.
dmr(cl, covars, counts, mu=NULL, bins=NULL, verb=0, cv=FALSE, ...) ## S3 method for class 'dmr' coef(object, ...) ## S3 method for class 'dmr' predict(object, newdata, type=c("link","response","class"), ...)
dmr(cl, covars, counts, mu=NULL, bins=NULL, verb=0, cv=FALSE, ...) ## S3 method for class 'dmr' coef(object, ...) ## S3 method for class 'dmr' predict(object, newdata, type=c("link","response","class"), ...)
cl |
A |
covars |
A dense |
counts |
A dense |
mu |
Pre-specified fixed effects for each observation in the Poisson regression linear equation. If |
bins |
Number of bins into which we will attempt to collapse each column of |
verb |
Whether to print some info. |
cv |
A flag for whether to use |
type |
For |
newdata |
A Matrix with the same number of columns as |
... |
Additional arguments to |
object |
A |
dmr
fits multinomial logistic regression by assuming that, unconditionally on the ‘size’ (total count across categories) each individual category count has been generated as a Poisson
We [default] plug-in estimate , where
and
is the dimension of
. Then each individual is outsourced to Poisson regression in the
gamlr
package via the parLapply
function of the parallel
library. The output from dmr
is a list of gamlr
fitted models.
coef.dmr
builds a matrix of multinomial logistic regression
coefficients from the length(object)
list of gamlr
fits. Default selection under cv=FALSE
uses an
information criteria via AICc
on Poisson deviance for each
individual response dimension (see gamlr
). Combined coefficients
across all dimensions are then returned as a dmrcoef
s4-class
object.
predict.dmr
takes either a dmr
or dmrcoef
object and returns predicted values for newdata
on the scale defined by the type
argument.
dmr
returns the dmr
s3 object: an ncol(counts)
-length list of fitted gamlr
objects, with the added attributes nlambda
, mu
, and nobs
.
Matt Taddy [email protected]
Taddy (2015 AoAS) Distributed Multinomial Regression
Taddy (2017 JCGS) One-step Estimator Paths for Concave Regularization, the Journal of Computational and Graphical Statistics
Taddy (2013 JASA) Multinomial Inverse Regression for Text Analysis
dmrcoef-class
, cv.dmr
, AICc
, and the gamlr
and textir
packages.
library(MASS) data(fgl) ## make your cluster ## FORK is faster but memory heavy, and doesn't work on windows. cl <- makeCluster(2,type=ifelse(.Platform$OS.type=="unix","FORK","PSOCK")) print(cl) ## fit in parallel fits <- dmr(cl, fgl[,1:9], fgl$type, verb=1) ## its good practice stop the cluster once you're done stopCluster(cl) ## Individual Poisson model fits and AICc selection par(mfrow=c(3,2)) for(j in 1:6){ plot(fits[[j]]) mtext(names(fits)[j],font=2,line=2) } ## AICc model selection B <- coef(fits) ## Fitted probability by true response par(mfrow=c(1,1)) P <- predict(B, fgl[,1:9], type="response") boxplot(P[cbind(1:214,fgl$type)]~fgl$type, ylab="fitted prob of true class")
library(MASS) data(fgl) ## make your cluster ## FORK is faster but memory heavy, and doesn't work on windows. cl <- makeCluster(2,type=ifelse(.Platform$OS.type=="unix","FORK","PSOCK")) print(cl) ## fit in parallel fits <- dmr(cl, fgl[,1:9], fgl$type, verb=1) ## its good practice stop the cluster once you're done stopCluster(cl) ## Individual Poisson model fits and AICc selection par(mfrow=c(3,2)) for(j in 1:6){ plot(fits[[j]]) mtext(names(fits)[j],font=2,line=2) } ## AICc model selection B <- coef(fits) ## Fitted probability by true response par(mfrow=c(1,1)) P <- predict(B, fgl[,1:9], type="response") boxplot(P[cbind(1:214,fgl$type)]~fgl$type, ylab="fitted prob of true class")
"dmrcoef"
The extended dgCMatrix
class for output from coef.dmr
.
This is the class for a covariate matrix from dmr
regression; it inherits the dgCMatrix
class as defined in the Matrix
library.
In particular, this is the ncol(covars)
by ncol(counts)
matrix of logistic regression coefficients chosen in coef.dmr
from the regularization paths for each category.
Objects can be created only by a call to the coef.dmr
function.
i
:From dgCMatrix
: the row indices.
p
:From dgCMatrix
: the column pointers.
Dim
:From dgCMatrix
: the dimensions.
Dimnames
:From dgCMatrix
: the list of labels.
x
:From dgCMatrix
: the nonzero entries.
factors
:From dgCMatrix
.
Class dgCMatrix
, directly.
signature(object = "dmrcoef")
:
Prediction for a given dmrcoef
matrix. Takes the same arguments as predict.dmr
, but will be faster (since coef.dmr
is called inside predict.dmr
).
Matt Taddy [email protected]
dmr, coef.dmr, predict.dmr
showClass("dmrcoef")
showClass("dmrcoef")