semopy employs a generalization of syntax popular amongst linear modelling tools in R and is heavily inspired by the syntax of lavaan. We separate semopy syntax directives into "relations" and "operations". "Relations" constitute a relationship of a certain kind between variables in a SEM model, whereas "operation" can impose a more lower-level constraints to the model or affect internal semopy clockworks.


Relations typically consist of left-sided variables (lvalues) and right-sided values (rvalues) separated by an operator character that constitutes the nature of relationship between lvalues and rvalues. In semopy, lvalues are separated by a comma "," sign and rvalues are separated by a plus "+" sign. The supported operators are:

Regression operator
Constituted by a tilde symbol ~, implies that rvalues regress onto lvalues. For instance, consider a multivaraite regression model:
y ~ x1 + x2 + x3
It implies that x1, x2, x3 regress onto y.
Measurement operator
Defined by =~ symbols, it's effectively a syntax sugar and translates into a regression operator with lvalues and rvalues swapped. However, it also postulates that lvalues are latent variables and, unless specified otherwise, fixes the first loading in the relationship between a latent factor and an observed variable to 1.0. Example:
eta =~ y1 + y2 + y3
It introduces a latent factor eta into the model that regresses onto y1, y2, y3. The first regression coefficient between y1 and eta is fixed to 1.0.
Covariance operator
Variances and covariances between variables are defined by ~~ symbols. Example:
a ~~ b + c
c ~~ c
It adds covariance parameters between a and b, between a and c, and parametrises variance of c.

When comma , sign is used in an lvalue part of a relation, the relation is effectively duplicated for each lvalue, i.e.

y1, y2 ~ x1 + x2 + x3
translates into
y1 ~ x1 + x2 + x3
y2 ~ x1 + x2 + x3

Naming and fixing parameters

Variables in the right-value part of relations can be preceeded with either a string or a float separated by a * character. If it is a string, then it will be assigned to the corresponding parameter as its new name. It can be used either to reuse the same parameter in different parts of SEM model or to refer it in constraints (see Operations and constraints). The name of parameter will be seen if inspect method of the Model is called with an appropriate arguments, for example:

Python script:
from semopy import Model
from semopy.examples import multivariate_regression

desc = '''y ~ x1 + MyParam*x2 + MyParam*x3'''
data = multivariate_regression.get_data()
mod = Model(desc)
print(mod.inspect('mx', what='names')['Lambda'])
     x1       x2       x3
y   _b1  MyParam  MyParam
x1    1        0        0
x2    0        1        0
x3    0        0        1

Parameters can be fixed to a constant value by specifying a float multiplier:

Python script:
from semopy import Model
from semopy.examples import \

desc = '''y ~ x1 + 6*x2 + x3'''
data = multivariate_regression.get_data()
mod = Model(desc)
  lval  op rval  Estimate   Std. Err  z-value      p-value
0    y   ~   x1  1.936024  0.0584811  33.1051            0
1    y   ~   x2  6.000000          -        -            -
2    y   ~   x3 -9.778350   0.107633 -90.8491            0
3    y  ~~    y  1.075682   0.152124  7.07107  1.53744e-12
If you want to fix parameter to it's starting value, it's enough to add "START" multiplier.

Operations and constraints

Operations are actions that are to be performed onto certain semopy entities, such as variables or parameters. In semopy, operations have structure OperationName(OperationParameters) entity_1, entity_2, ..., entity_n. Model supports the following operations:

Makes semopy treat variables as latent factors. Example:
y1 ~ 1.0 * eta1
y2, y3 ~ eta1 + eta2
y4 ~ 1.5 * eta2
DEFINE(latent) eta1 eta2
It sets variables eta1, eta2 as latents. Notice that this is the same as:
eta1 =~ y1 + y2 + y3
eta2 =~ y2 + y3 + 1.5*y4
Makes semopy treat variables as ordinal, i.e. their polychoric and/or polyserial correlations will be estimated. Valid only for Model.
y ~ x1 + cat1 + cat2
DEFINE(ordinal) cat1 cat2
Here, Pearsons correlations between cat1 and cat2 variables will be substituted with polychoric correlations, and correlations inbetween cat1, cat2 and x, y with polyserial correlations.
Sets starting value x to parameters. Example:
y ~ a*x1 + b*x2 + c*x3
START(1.5) a b
START(-5) c
Starting values for a, b set to 1.5, for c set to -5.
BOUND(l, r)
Sets bound constraints in interval (l, r) to parameters. Example:
y ~ x1 + x2 + x3
y ~~ a * y
BOUND(4, 100) a
This bounds variance parameter to lie in interval between 4 and 100.
Adds an arbitrary inequality or equality constraint constr to the optimization procedure. Example:
y ~ a * x1 + b * x2 + c * x3
y ~~ v * y
START(6) b
CONSTRAINT(exp(a) + log(b) = 10)
CONSTRAINT(v > cos(a)^2 + sin(b)^2)
Any sympy-compaitble formula can be supplied as constr.