semopy employs a generalization of syntax popular amongst linear modelling tools in R and is heavily inspired by the syntax of lavaan. We separate semopy syntax directives into "relations" and "operations". "Relations" constitute a relationship of a certain kind between variables in a SEM model, whereas "operation" can impose a more lower-level constraints to the model or affect internal semopy clockworks.

Relations typically consist of left-sided variables (lvalues) and right-sided values (rvalues) separated by an operator character that constitutes the nature of relationship between lvalues and rvalues. In semopy, lvalues are separated by a comma "," sign and rvalues are separated by a plus "+" sign. The supported operators are:

Constituted by a tilde symbol ~, implies that rvalues regress onto lvalues. For instance, consider a multivaraite regression model:

y ~ x1 + x2 + x3It implies that x1, x2, x3 regress onto y.

Defined by =~ symbols, it's effectively a syntax sugar and translates into a regression operator with lvalues and rvalues swapped. However, it also postulates that lvalues are latent variables and, unless specified otherwise, fixes the first loading in the relationship between a latent factor and an observed variable to 1.0. Example:

eta =~ y1 + y2 + y3It introduces a latent factor eta into the model that regresses onto y1, y2, y3. The first regression coefficient between y1 and eta is fixed to 1.0.

Variances and covariances between variables are defined by ~~ symbols. Example:

a ~~ b + c c ~~ cIt adds covariance parameters between a and b, between a and c, and parametrises variance of c.

When comma , sign is used in an lvalue part of a relation, the relation is effectively duplicated for each lvalue, i.e.

y1, y2 ~ x1 + x2 + x3

translates into

y1 ~ x1 + x2 + x3 y2 ~ x1 + x2 + x3

Variables in the right-value part of relations can be preceeded with either a string or a float separated by a * character. If it is a string, then it will be assigned to the corresponding parameter as its new name. It can be used either to reuse the same parameter in different parts of SEM model or to refer it in constraints (see Operations and constraints). The name of parameter will be seen if inspect method of the Model is called with an appropriate arguments, for example:

Python script:

Output:

from semopy import Model from semopy.examples import multivariate_regression desc = '''y ~ x1 + MyParam*x2 + MyParam*x3''' data = multivariate_regression.get_data() mod = Model(desc) mod.fit(data) print(mod.inspect('mx', what='names')['Lambda'])

x1 x2 x3 y _b1 MyParam MyParam x1 1 0 0 x2 0 1 0 x3 0 0 1

Parameters can be fixed to a constant value by specifying a float multiplier:

Python script:

Output:

from semopy import Model from semopy.examples import \ multivariate_regression desc = '''y ~ x1 + 6*x2 + x3''' data = multivariate_regression.get_data() mod = Model(desc) mod.fit(data) print(mod.inspect())

lval op rval Estimate Std. Err z-value p-value 0 y ~ x1 1.936024 0.0584811 33.1051 0 1 y ~ x2 6.000000 - - - 2 y ~ x3 -9.778350 0.107633 -90.8491 0 3 y ~~ y 1.075682 0.152124 7.07107 1.53744e-12

`"START"`

multiplier. `OperationName(OperationParameters) entity_1, entity_2, ..., entity_n`

. Model supports the following operations: Makes semopy treat variables as latent factors. Example:

y1 ~ 1.0 * eta1 y2, y3 ~ eta1 + eta2 y4 ~ 1.5 * eta2 DEFINE(latent) eta1 eta2It sets variables eta1, eta2 as latents. Notice that this is the same as:

eta1 =~ y1 + y2 + y3 eta2 =~ y2 + y3 + 1.5*y4

Makes semopy treat variables as ordinal, i.e. their polychoric and/or polyserial correlations will be estimated. Valid only for Model.

y ~ x1 + cat1 + cat2 DEFINE(ordinal) cat1 cat2Here, Pearsons correlations between cat1 and cat2 variables will be substituted with polychoric correlations, and correlations inbetween cat1, cat2 and x, y with polyserial correlations.

Sets starting value x to parameters. Example:

y ~ a*x1 + b*x2 + c*x3 START(1.5) a b START(-5) cStarting values for a, b set to 1.5, for c set to -5.

Sets bound constraints in interval (l, r) to parameters. Example:

y ~ x1 + x2 + x3 y ~~ a * y BOUND(4, 100) aThis bounds variance parameter to lie in interval between 4 and 100.

Adds an arbitrary inequality or equality constraint constr to the optimization procedure. Example:

y ~ a * x1 + b * x2 + c * x3 y ~~ v * y START(6) b CONSTRAINT(exp(a) + log(b) = 10) CONSTRAINT(v > cos(a)^2 + sin(b)^2)Any sympy-compaitble formula can be supplied as constr.