semopy is freely available at the pypi repository. The most straightforward and universal way to install it is to run a command:
pip install semopy
Alternatively, you can download the package directly from its git repository:
git clone https://gitlab.com/georgy.m/semopy
Let's take a look at a quick example of a typical semopy working session using built-in example.
First, let's get text description of the SEM model:
import semopy import pandas as pd desc = semopy.examples.political_democracy.get_model() print(desc)
Output:
# measurement model ind60 =~ x1 + x2 + x3 dem60 =~ y1 + y2 + y3 + y4 dem65 =~ y5 + y6 + y7 + y8 # regressions dem60 ~ ind60 dem65 ~ ind60 + dem60 # residual correlations y1 ~~ y5 y2 ~~ y4 + y6 y3 ~~ y7 y4 ~~ y8 y6 ~~ y8
Let's get the associated dataset:
data = semopy.examples.political_democracy.get_data() print(data.head())
Output:
y1 y2 y3 y4 ... y8 x1 x2 x3 1 2.50 0.000000 3.333333 0.000000 ... 3.333333 4.442651 3.637586 2.557615 2 1.25 0.000000 3.333333 0.000000 ... 0.736999 5.384495 5.062595 3.568079 3 7.50 8.800000 9.999998 9.199991 ... 8.211809 5.961005 6.255750 5.224433 4 8.90 8.800000 9.999998 9.199991 ... 4.615086 6.285998 7.567863 6.267495 5 10.00 3.333333 9.999998 6.666666 ... 6.666666 5.863631 6.818924 4.573679 [5 rows x 11 columns]
Now, we fit the model to the data and examine optimization results:
mod = semopy.Model(desc) res = mod.fit(data) print(res)
Output:
Name of objective: MLW Optimization method: SLSQP Optimization successful. Optimization terminated successfully Objective value: 0.508 Number of iterations: 52 Params: 2.180 1.819 1.257 1.058 1.265 1.186 1.280 1.266 1.482 0.572 0.838 0.624 1.893 1.320 2.156 7.385 0.793 5.067 0.347 3.148 1.357 4.954 0.082 0.172 0.120 3.256 0.467 3.951 3.430 2.352 0.448
Finally, let's inspect parameters estimates:
ins = mod.inspect() print(ins)
Output:
lval op rval Estimate Std. Err z-value p-value 0 dem60 ~ ind60 1.482381 0.399018 3.71508 0.000203142 1 dem65 ~ ind60 0.571913 0.22138 2.5834 0.00978329 2 dem65 ~ dem60 0.837576 0.0984455 8.50802 0 3 x1 ~ ind60 1.000000 - - - 4 x2 ~ ind60 2.180490 0.138538 15.7392 0 5 x3 ~ ind60 1.818548 0.151979 11.9658 0 6 y1 ~ dem60 1.000000 - - - 7 y2 ~ dem60 1.256818 0.182686 6.87966 5.99965e-12 8 y3 ~ dem60 1.058173 0.15152 6.98371 2.87481e-12 9 y4 ~ dem60 1.265187 0.145151 8.71636 0 10 y5 ~ dem65 1.000000 - - - 11 y6 ~ dem65 1.185744 0.168908 7.02007 2.21756e-12 12 y7 ~ dem65 1.279717 0.159996 7.99845 1.33227e-15 13 y8 ~ dem65 1.266083 0.158237 8.00118 1.33227e-15 14 dem65 ~~ dem65 0.172209 0.214861 0.801488 0.422849 15 dem60 ~~ dem60 3.950850 0.92045 4.2923 1.76829e-05 16 ind60 ~~ ind60 0.448325 0.0866717 5.17268 2.30759e-07 17 y1 ~~ y5 0.624422 0.358434 1.74208 0.081494 18 y1 ~~ y1 1.892742 0.444559 4.25757 2.06663e-05 19 y2 ~~ y4 1.319584 0.702679 1.87793 0.0603906 20 y2 ~~ y6 2.156162 0.734155 2.93693 0.00331478 21 y2 ~~ y2 7.385292 1.37567 5.3685 7.93927e-08 22 y3 ~~ y7 0.793330 0.607642 1.30559 0.191693 23 y3 ~~ y3 5.066628 0.951721 5.32365 1.01706e-07 24 y4 ~~ y8 0.347221 0.442234 0.785153 0.432364 25 y4 ~~ y4 3.147914 0.738841 4.26061 2.03871e-05 26 y6 ~~ y8 1.357036 0.5685 2.38705 0.0169844 27 y6 ~~ y6 4.954365 0.914285 5.41884 5.99863e-08 28 x1 ~~ x1 0.081536 0.0194887 4.18376 2.86733e-05 29 x2 ~~ x2 0.119879 0.0697343 1.71909 0.0855983 30 y8 ~~ y8 3.256387 0.695039 4.68518 2.79708e-06 31 x3 ~~ x3 0.466730 0.0901626 5.17654 2.26045e-07 32 y7 ~~ y7 3.430032 0.712732 4.81251 1.49045e-06 33 y5 ~~ y5 2.351909 0.480369 4.89604 9.77848e-07
Assume that we have a SEM model description in semopy syntax in the string variable desc and a data in the pandas DataFrame variable data. Then, an ordinary semopy session looks like this:
from semopy import Model model = Model(desc) opt_res = model.fit(data) estimates = model.inspect()fit method has 3 arguments of interest:
"MLW"
(the default): Wishart loglikelihood;"ULS"
: Unweighted Least Squares;"GLS"
: Generalized Least Squares;"WLS"
: Weighted Least Squares (also known as an Asymptotic Distribution-Free Estimator);"DWLS"
: Diagonally Weighted Least Squares (also known as robust WLS);"FIML"
: Full Information Maximum Likelihood (when data has no missing values FIML is effectively a Multivariate Normal Maximum Likelihood)."SLSQP"
).Method fit returns a special structure that contains useful information on the optimization process.
"list"
(the default): DataFrame with estimates is returned;"mx"
: Dictionary with model iternal structures (matrices) is returned."mx"
; determines what values are displayed in place of estimated parameters in matrices. Can take 3 values: "est"
(the default): matrices are returned as-is with current parameter estimates;"start"
: matrices are returned filled with the starting values of parameters."names"
: instead of values, parameter names/identifiers are displayed in place of their respective parameters.True
or "lv"
, standardized coefficients are also returned (non-output variables are not standardized in case of "lv"
). The default is False
.Simple univariate linear regression model.
Import model description and data:
from semopy import Model from semopy.examples import univariate_regression desc = univariate_regression.get_model() data = univariate_regression.get_data() print(desc)
Output:
y ~ x
Fit model to data:
mod = Model(desc) res_opt = mod.fit(data) estimates = mod.inspect()Inspecting optimization information by
print(res_opt)
: Name of objective: MLW Optimization method: SLSQP Optimization successful. Optimization terminated successfully Objective value: 0.000 Number of iterations: 11 Params: -1.221 0.670Printing parameter estimates by
print(estimates)
: lval op rval Estimate Std. Err z-value p-value 0 y ~ x -1.221069 0.083165 -14.682538 0.000000e+00 1 y ~~ y 0.670367 0.094804 7.071068 1.537437e-12
Same as univariate linear regression model, but with multiple independent variables.
Import model description and data:
from semopy import Model from semopy.examples import univariate_regression_many desc = univariate_regression_many.get_model() data = univariate_regression_many.get_data() print(desc)
Output:
y ~ x1 + x2 + x3
Fit model to data:
mod = Model(desc) res_opt = mod.fit(data) estimates = mod.inspect()Inspecting optimization information by
print(res_opt)
: Name of objective: MLW Optimization method: SLSQP Optimization successful. Optimization terminated successfully Objective value: 0.000 Number of iterations: 13 Params: 1.400 0.451 1.190 0.878Printing parameter estimates by
print(estimates)
: lval op rval Estimate Std. Err z-value p-value 0 y ~ x1 1.399551 0.091138 15.356385 0.000000e+00 1 y ~ x2 0.450561 0.097883 4.603051 4.163465e-06 2 y ~ x3 1.190470 0.086499 13.762839 0.000000e+00 3 y ~~ y 0.878486 0.124237 7.071068 1.537437e-12
Multivariate linear regression model.
Import model description and data:
from semopy import Model from semopy.examples import multivariate_regression desc = multivariate_regression.get_model() data = multivariate_regression.get_data() print(desc)
Output:
y1, y2, y3 ~ x1 + x2 + x3
Fit model to data:
mod = Model(desc) res_opt = mod.fit(data) estimates = mod.inspect()Inspecting optimization information by
print(res_opt)
: Name of objective: MLW Optimization method: SLSQP Optimization successful. Optimization terminated successfully Objective value: 0.068 Number of iterations: 17 Params: -1.390 -1.138 -0.318 -0.746 1.074 -1.131 0.703 1.235 -0.920 1.136 0.489 0.638Printing parameter estimates by
print(estimates)
: lval op rval Estimate Std. Err z-value p-value 0 y1 ~ x1 -1.389754 0.073417 -18.929470 0.000000e+00 1 y1 ~ x2 -1.138405 0.087966 -12.941462 0.000000e+00 2 y1 ~ x3 -0.317893 0.072576 -4.380132 1.186073e-05 3 y2 ~ x1 -0.745837 0.097974 -7.612623 2.686740e-14 4 y2 ~ x2 1.074436 0.117388 9.152855 0.000000e+00 5 y2 ~ x3 -1.130890 0.096851 -11.676597 0.000000e+00 6 y3 ~ x1 0.702778 0.064270 10.934755 0.000000e+00 7 y3 ~ x2 1.235044 0.077006 16.038334 0.000000e+00 8 y3 ~ x3 -0.920469 0.063534 -14.487925 0.000000e+00 9 y2 ~~ y2 1.135729 0.160616 7.071068 1.537437e-12 10 y3 ~~ y3 0.488735 0.069118 7.071068 1.537437e-12 11 y1 ~~ y1 0.637755 0.090192 7.071068 1.537437e-12
A classic Hozlinger-Swineford dataset and CFA model.
Import model description and data:
from semopy import Model from semopy.examples import holzinger39 desc = holzinger39.get_model() data = holzinger39.get_data() print(desc)
Output:
visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9
Fit model to data:
mod = Model(desc) res_opt = mod.fit(data) estimates = mod.inspect()Inspecting optimization information by
print(res_opt)
: Name of objective: MLW Optimization method: SLSQP Optimization successful. Optimization terminated successfully Objective value: 0.283 Number of iterations: 28 Params: 0.554 0.731 1.113 0.926 1.180 1.083 0.550 1.133 0.356 0.488 0.844 0.371 0.800 0.566 0.446 0.980 0.408 0.174 0.808 0.262 0.383Printing parameter estimates by
print(estimates)
: lval op rval Estimate Std. Err z-value p-value 0 x1 ~ visual 1.000000 - - - 1 x2 ~ visual 0.554421 0.099727 5.559413 0.0 2 x3 ~ visual 0.730526 0.10918 6.691009 0.0 3 x4 ~ textual 1.000000 - - - 4 x5 ~ textual 1.113076 0.065392 17.021522 0.0 5 x6 ~ textual 0.926120 0.055425 16.709493 0.0 6 x7 ~ speed 1.000000 - - - 7 x8 ~ speed 1.179980 0.165045 7.149459 0.0 8 x9 ~ speed 1.082517 0.151354 7.152197 0.0 9 textual ~~ textual 0.980034 0.112145 8.739002 0.0 10 textual ~~ visual 0.408277 0.073527 5.55273 0.0 11 textual ~~ speed 0.173603 0.049316 3.520223 0.000431 12 visual ~~ visual 0.808310 0.145287 5.563548 0.0 13 visual ~~ speed 0.262135 0.056252 4.659977 0.000003 14 speed ~~ speed 0.383377 0.086171 4.449045 0.000009 15 x1 ~~ x1 0.550161 0.113439 4.84983 0.000001 16 x2 ~~ x2 1.133391 0.101711 11.143202 0.0 17 x6 ~~ x6 0.356171 0.04303 8.277334 0.0 18 x8 ~~ x8 0.487934 0.074167 6.578856 0.0 19 x3 ~~ x3 0.843731 0.090625 9.31016 0.0 20 x4 ~~ x4 0.371117 0.047712 7.778264 0.0 21 x7 ~~ x7 0.799708 0.081387 9.825966 0.0 22 x9 ~~ x9 0.565804 0.070757 7.996483 0.0 23 x5 ~~ x5 0.446208 0.058387 7.642264 0.0
Bollen's Data on Industrialization and Political Democracy is a common benchmark amongst SEM tools.
Import model description and data:
from semopy import Model from semopy.examples import political_democracy desc = political_democracy.get_model() data = political_democracy.get_data() print(desc)
Output:
# measurement model ind60 =~ x1 + x2 + x3 dem60 =~ y1 + y2 + y3 + y4 dem65 =~ y5 + y6 + y7 + y8 # regressions dem60 ~ ind60 dem65 ~ ind60 + dem60 # residual correlations y1 ~~ y5 y2 ~~ y4 + y6 y3 ~~ y7 y4 ~~ y8 y6 ~~ y8
Fit model to data:
mod = Model(desc) res_opt = mod.fit(data) estimates = mod.inspect()Inspecting optimization information by
print(res_opt)
: Name of objective: MLW Optimization method: SLSQP Optimization successful. Optimization terminated successfully Objective value: 0.508 Number of iterations: 52 Params: 2.180 1.819 1.257 1.058 1.265 1.186 1.280 1.266 1.482 0.572 0.838 0.624 1.893 1.320 2.156 7.385 0.793 5.067 0.347 3.148 1.357 4.954 0.082 3.256 0.172 3.430 0.120 3.951 0.467 2.352 0.448Printing parameter estimates by
print(estimates)
: lval op rval Estimate Std. Err z-value p-value 0 dem60 ~ ind60 1.482379 0.399024 3.715017 0.000203 1 dem65 ~ ind60 0.571912 0.221383 2.583364 0.009784 2 dem65 ~ dem60 0.837574 0.098446 8.507992 0.0 3 x1 ~ ind60 1.000000 - - - 4 x2 ~ ind60 2.180494 0.138565 15.736254 0.0 5 x3 ~ ind60 1.818546 0.151993 11.96465 0.0 6 y1 ~ dem60 1.000000 - - - 7 y2 ~ dem60 1.256819 0.182687 6.879647 0.0 8 y3 ~ dem60 1.058174 0.151521 6.983699 0.0 9 y4 ~ dem60 1.265186 0.145151 8.716344 0.0 10 y5 ~ dem65 1.000000 - - - 11 y6 ~ dem65 1.185743 0.168908 7.020032 0.0 12 y7 ~ dem65 1.279717 0.159996 7.99841 0.0 13 y8 ~ dem65 1.266084 0.158238 8.001141 0.0 14 dem65 ~~ dem65 0.172210 0.214861 0.801494 0.422846 15 dem60 ~~ dem60 3.950849 0.920451 4.292296 0.000018 16 ind60 ~~ ind60 0.448321 0.086677 5.172345 0.0 17 y1 ~~ y5 0.624423 0.358435 1.742083 0.081494 18 y1 ~~ y1 1.892743 0.44456 4.257565 0.000021 19 y2 ~~ y4 1.319589 0.70268 1.877937 0.06039 20 y2 ~~ y6 2.156164 0.734155 2.936934 0.003315 21 y2 ~~ y2 7.385292 1.375671 5.368501 0.0 22 y3 ~~ y7 0.793329 0.607642 1.305585 0.191694 23 y3 ~~ y3 5.066628 0.951722 5.323646 0.0 24 y4 ~~ y8 0.347222 0.442234 0.785154 0.432363 25 y4 ~~ y4 3.147911 0.738841 4.260605 0.00002 26 y6 ~~ y8 1.357037 0.5685 2.387047 0.016984 27 y6 ~~ y6 4.954364 0.914284 5.418843 0.0 28 x1 ~~ x1 0.081573 0.019495 4.184317 0.000029 29 y8 ~~ y8 3.256389 0.69504 4.685182 0.000003 30 y7 ~~ y7 3.430032 0.712732 4.812512 0.000001 31 x2 ~~ x2 0.119894 0.069747 1.718973 0.085619 32 x3 ~~ x3 0.466732 0.090168 5.176276 0.0 33 y5 ~~ y5 2.351910 0.480369 4.896044 0.000001
Complex synthetic model that is used in the upcoming publication.
Import model description and data:
from semopy import Model from semopy.examples import example_model desc = example_model.get_model() data = example_model.get_data()
Output:
# Measurement part eta1 =~ y1 + y2 + y3 eta2 =~ y3 + y2 eta3 =~ y4 + y5 eta4 =~ y4 + y6 # Structural part eta3 ~ x2 + x1 eta4 ~ x3 x3 ~ eta1 + eta2 + x1 x4 ~ eta4 + x6 y7 ~ x4 + x6 # Additional covariances y6 ~~ y5 x2 ~~ eta2
Fit model to data:
mod = Model(desc) res_opt = mod.fit(data) estimates = mod.inspect()Inspecting optimization information by
print(res_opt)
: Name of objective: MLW Optimization method: SLSQP Optimization successful. Optimization terminated successfully Objective value: 0.091 Number of iterations: 58 Params: -0.488 -0.782 -0.183 1.225 1.444 -1.147 -1.344 1.223 1.071 -0.348 1.291 1.454 0.840 -0.388 -0.625 -0.106 1.252 -0.084 1.010 1.097 0.654 0.844 0.804 0.870 1.114 0.871 0.824 0.696 1.182 -0.499 1.264Printing parameter estimates by
print(estimates)
: lval op rval Estimate Std. Err z-value p-value 0 eta3 ~ x2 -1.146663 0.065317 -17.55527 0.0 1 eta3 ~ x1 -1.344422 0.076917 -17.478884 0.0 2 eta4 ~ x3 1.222542 0.038071 32.112318 0.0 3 x3 ~ eta1 1.070822 1261846.903958 0.000001 0.999999 4 x3 ~ eta2 -0.347555 0.146593 -2.370895 0.017745 5 x3 ~ x1 1.291230 0.075725 17.051592 0.0 6 x4 ~ eta4 1.454421 0.041067 35.41557 0.0 7 x4 ~ x6 0.839923 0.06817 12.320923 0.0 8 y1 ~ eta1 1.000000 - - - 9 y2 ~ eta1 -0.488414 2839143.268003 -0.0 1.0 10 y2 ~ eta2 -0.781996 0.912859 -0.856646 0.391641 11 y3 ~ eta1 -0.182725 3630634.464816 -0.0 1.0 12 y3 ~ eta2 1.000000 - - - 13 y4 ~ eta3 1.000000 - - - 14 y4 ~ eta4 1.000000 - - - 15 y5 ~ eta3 1.224550 0.048392 25.304791 0.0 16 y6 ~ eta4 1.443567 0.040942 35.258544 0.0 17 y7 ~ x4 -0.387558 0.01444 -26.8399 0.0 18 y7 ~ x6 -0.624882 0.058 -10.773807 0.0 19 x2 ~~ eta2 -0.084431 0.087237 -0.967832 0.333128 20 x4 ~~ x4 1.009523 0.136551 7.393021 0.0 21 eta4 ~~ eta4 0.803514 0.090644 8.864495 0.0 22 eta3 ~~ eta3 0.869520 0.110941 7.837675 0.0 23 x3 ~~ x3 1.114065 0.566346 1.967111 0.04917 24 eta2 ~~ eta2 1.181504 3623126.890326 0.0 1.0 25 eta2 ~~ eta1 -0.498966 4587467.500901 -0.0 1.0 26 eta1 ~~ eta1 1.263544 0.456489 2.767959 0.005641 27 y6 ~~ y5 -0.105931 0.101857 -1.039999 0.29834 28 y6 ~~ y6 1.251659 0.151825 8.244097 0.0 29 y7 ~~ y7 1.096623 0.089539 12.247449 0.0 30 y4 ~~ y4 0.654485 0.11071 5.911725 0.0 31 y3 ~~ y3 0.844282 0.961208 0.878355 0.379751 32 y2 ~~ y2 0.871375 0.751912 1.158879 0.246505 33 y5 ~~ y5 0.823609 0.143472 5.740541 0.0 34 y1 ~~ y1 0.695780 0.435022 1.599413 0.109729