semopy employs a generalization of syntax popular amongst linear modelling tools in R and is heavily inspired by the syntax of lavaan. We separate semopy syntax directives into "relations" and "operations". "Relations" constitute a relationship of a certain kind between variables in a SEM model, whereas "operation" can impose a more lower-level constraints to the model or affect internal semopy clockworks.
Relations typically consist of left-sided variables (lvalues) and right-sided values (rvalues) separated by an operator character that constitutes the nature of relationship between lvalues and rvalues. In semopy, lvalues are separated by a comma "," sign and rvalues are separated by a plus "+" sign. The supported operators are:
y ~ x1 + x2 + x3It implies that x1, x2, x3 regress onto y.
eta =~ y1 + y2 + y3It introduces a latent factor eta into the model that regresses onto y1, y2, y3. The first regression coefficient between y1 and eta is fixed to 1.0.
a ~~ b + c c ~~ cIt adds covariance parameters between a and b, between a and c, and parametrises variance of c.
When comma , sign is used in an lvalue part of a relation, the relation is effectively duplicated for each lvalue, i.e.
y1, y2 ~ x1 + x2 + x3
y1 ~ x1 + x2 + x3 y2 ~ x1 + x2 + x3
Variables in the right-value part of relations can be preceeded with either a string or a float separated by a * character. If it is a string, then it will be assigned to the corresponding parameter as its new name. It can be used either to reuse the same parameter in different parts of SEM model or to refer it in constraints (see Operations and constraints). The name of parameter will be seen if inspect method of the Model is called with an appropriate arguments, for example:
from semopy import Model from semopy.examples import multivariate_regression desc = '''y ~ x1 + MyParam*x2 + MyParam*x3''' data = multivariate_regression.get_data() mod = Model(desc) mod.fit(data) print(mod.inspect('mx', what='names')['Lambda'])
x1 x2 x3 y _b1 MyParam MyParam x1 1 0 0 x2 0 1 0 x3 0 0 1
Parameters can be fixed to a constant value by specifying a float multiplier:
from semopy import Model from semopy.examples import \ multivariate_regression desc = '''y ~ x1 + 6*x2 + x3''' data = multivariate_regression.get_data() mod = Model(desc) mod.fit(data) print(mod.inspect())
lval op rval Estimate Std. Err z-value p-value 0 y ~ x1 1.936024 0.0584811 33.1051 0 1 y ~ x2 6.000000 - - - 2 y ~ x3 -9.778350 0.107633 -90.8491 0 3 y ~~ y 1.075682 0.152124 7.07107 1.53744e-12
"START"
multiplier. OperationName(OperationParameters) entity_1, entity_2, ..., entity_n
. Model supports the following operations: y1 ~ 1.0 * eta1 y2, y3 ~ eta1 + eta2 y4 ~ 1.5 * eta2 DEFINE(latent) eta1 eta2It sets variables eta1, eta2 as latents. Notice that this is the same as:
eta1 =~ y1 + y2 + y3 eta2 =~ y2 + y3 + 1.5*y4
y ~ x1 + cat1 + cat2 DEFINE(ordinal) cat1 cat2Here, Pearsons correlations between cat1 and cat2 variables will be substituted with polychoric correlations, and correlations inbetween cat1, cat2 and x, y with polyserial correlations.
y ~ a*x1 + b*x2 + c*x3 START(1.5) a b START(-5) cStarting values for a, b set to 1.5, for c set to -5.
y ~ x1 + x2 + x3 y ~~ a * y BOUND(4, 100) aThis bounds variance parameter to lie in interval between 4 and 100.
y ~ a * x1 + b * x2 + c * x3 y ~~ v * y START(6) b CONSTRAINT(exp(a) + log(b) = 10) CONSTRAINT(v > cos(a)^2 + sin(b)^2)Any sympy-compaitble formula can be supplied as constr.