Ordinal variables in semopy
We conider a variable to be ordinal if it has a categorical non-continuous nature (for instance, if we can encode it as an integer), and if you can meaningfully sort it (it is possible to impose a total order relation). Simple example of an ordinal variable is size encoded as "Tiny", "Small", "Average", "Big".
There are 2 ways to treat ordinal variables in semopy.
Fixed effects
First, if in your SEM model ordinal variables are also exogenous, its should be all right to treat those variables as fixed effects. This is done automatically if you use
ModelMeans or
ModelEffects. However, results are subject to change under different encodings of ordinal variables in the data.
Heterogenous correlation matrix
Second, is to fit the model not do covariance matrix, but to so-called heterogenous correlation matrix. Heterogenous correlation matrix is a correlation matrix, where correlations between ordinal variables are calculated as polychoric correlations, and correlations between ordinal and continious variables are calculated as polyserial correlations. For details, see
the semopy paper. To do it, just speciy ordinal variables using the
DEFINE command:
DEFINE(ordinal)
Makes semopy treat variables as ordinal, i.e. their polychoric and/or polyserial correlations will be estimated.
y ~ x1 + cat1 + cat2
DEFINE(ordinal) cat1 cat2
Here, Pearsons correlations between
cat1 and
cat2 variables will be substituted with polychoric correlations, and correlations inbetween
cat1, cat2 and
x, y with polyserial correlations.
This has some drawbacks:
- Heterogenous correlations matrix takes a long time to compute and the time increases drastically as increases the number of observations and the number odinal variables.
- Sometimes, heterogenous correlation matrix is not positive-definite, and semopy will find the closest positive-definite matrix. It might result in some original information deformation.
- It works only for Model.