semopy is freely available at the pypi repository. The most straightforward and universal way to install it is to run a command:
pip install semopy
Alternatively, you can download the package directly from it's git repository:
git clone https://gitlab.com/georgy.m/semopy
Let's take a look at a quick example of a typical semopy working session using built-in example.
First, let's get text description of the SEM model:
from semopy import Model import pandas as pd desc = semopy.examples.political_democracy.get_model() print(desc)
Output:
# measurement model ind60 =~ x1 + x2 + x3 dem60 =~ y1 + y2 + y3 + y4 dem65 =~ y5 + y6 + y7 + y8 # regressions dem60 ~ ind60 dem65 ~ ind60 + dem60 # residual correlations y1 ~~ y5 y2 ~~ y4 + y6 y3 ~~ y7 y4 ~~ y8 y6 ~~ y8
Let's get the associated dataset:
data = semopy.examples.political_democracy.get_data() print(data.head())
Output:
y1 y2 y3 y4 ... y8 x1 x2 x3 1 2.50 0.000000 3.333333 0.000000 ... 3.333333 4.442651 3.637586 2.557615 2 1.25 0.000000 3.333333 0.000000 ... 0.736999 5.384495 5.062595 3.568079 3 7.50 8.800000 9.999998 9.199991 ... 8.211809 5.961005 6.255750 5.224433 4 8.90 8.800000 9.999998 9.199991 ... 4.615086 6.285998 7.567863 6.267495 5 10.00 3.333333 9.999998 6.666666 ... 6.666666 5.863631 6.818924 4.573679 [5 rows x 11 columns]
Now, we fit the model to the data and examine optimization results:
mod = Model(desc) res = mod.fit(data) print(res)
Output:
Name of objective: MLW Optimization method: SLSQP Optimization successful. Optimization terminated successfully Objective value: 0.508 Number of iterations: 52 Params: 2.180 1.819 1.257 1.058 1.265 1.186 1.280 1.266 1.482 0.572 0.838 0.624 1.893 1.320 2.156 7.385 0.793 5.067 0.347 3.148 1.357 4.954 0.082 0.172 0.120 3.256 0.467 3.951 3.430 2.352 0.448
Finally, let's inspect parameters estimates:
ins = mod.inspect() print(ins)
Output:
lval op rval Estimate Std. Err z-value p-value 0 dem60 ~ ind60 1.482381 0.399018 3.71508 0.000203142 1 dem65 ~ ind60 0.571913 0.22138 2.5834 0.00978329 2 dem65 ~ dem60 0.837576 0.0984455 8.50802 0 3 x1 ~ ind60 1.000000 - - - 4 x2 ~ ind60 2.180490 0.138538 15.7392 0 5 x3 ~ ind60 1.818548 0.151979 11.9658 0 6 y1 ~ dem60 1.000000 - - - 7 y2 ~ dem60 1.256818 0.182686 6.87966 5.99965e-12 8 y3 ~ dem60 1.058173 0.15152 6.98371 2.87481e-12 9 y4 ~ dem60 1.265187 0.145151 8.71636 0 10 y5 ~ dem65 1.000000 - - - 11 y6 ~ dem65 1.185744 0.168908 7.02007 2.21756e-12 12 y7 ~ dem65 1.279717 0.159996 7.99845 1.33227e-15 13 y8 ~ dem65 1.266083 0.158237 8.00118 1.33227e-15 14 dem65 ~~ dem65 0.172209 0.214861 0.801488 0.422849 15 dem60 ~~ dem60 3.950850 0.92045 4.2923 1.76829e-05 16 ind60 ~~ ind60 0.448325 0.0866717 5.17268 2.30759e-07 17 y1 ~~ y5 0.624422 0.358434 1.74208 0.081494 18 y1 ~~ y1 1.892742 0.444559 4.25757 2.06663e-05 19 y2 ~~ y4 1.319584 0.702679 1.87793 0.0603906 20 y2 ~~ y6 2.156162 0.734155 2.93693 0.00331478 21 y2 ~~ y2 7.385292 1.37567 5.3685 7.93927e-08 22 y3 ~~ y7 0.793330 0.607642 1.30559 0.191693 23 y3 ~~ y3 5.066628 0.951721 5.32365 1.01706e-07 24 y4 ~~ y8 0.347221 0.442234 0.785153 0.432364 25 y4 ~~ y4 3.147914 0.738841 4.26061 2.03871e-05 26 y6 ~~ y8 1.357036 0.5685 2.38705 0.0169844 27 y6 ~~ y6 4.954365 0.914285 5.41884 5.99863e-08 28 x1 ~~ x1 0.081536 0.0194887 4.18376 2.86733e-05 29 x2 ~~ x2 0.119879 0.0697343 1.71909 0.0855983 30 y8 ~~ y8 3.256387 0.695039 4.68518 2.79708e-06 31 x3 ~~ x3 0.466730 0.0901626 5.17654 2.26045e-07 32 y7 ~~ y7 3.430032 0.712732 4.81251 1.49045e-06 33 y5 ~~ y5 2.351909 0.480369 4.89604 9.77848e-07
Assume that we have SEM model description in semopy syntax in the string variable desc and data in the pandas DataFrame variable data. Then, an ordinary semopy session looks like this:
from semopy import Model model = Model(desc) opt_res = model.fit(data) estimates = model.inspect()fit method has 3 arguments of interest:
"MLW"
(the default): Wishart loglikelihood;"ULS"
: Unweighted Least Squares;"GLS"
: Generalized Least Squares;"FIML"
: Full Information Maximum Likelihood (when data has no missing values FIML is effectively a Multivariate Normal Maximum Likelihood)."SLSQP"
).Method fit returns a special structure that contains useful information on the optimization process.
"list"
(the default): DataFrame with estimates is returned;"mx"
: Dictionary with model iternal structures (matrices) is returned."mx"
; determines what values are displayed in place of estimated parameters in matrices. Can take 3 values: "est"
(the default): matrices are returned as-is with current parameter estimates;"start"
: matrices are returned filled with the starting values of parameters."names"
: instead of values, parameter names/identifiers are displayed in place of their respective parameters.True
or "lv"
, standardized coefficients are also returned (non-output variables are not standardized in case of "lv"
). The default is False
.Simple univariate linear regression model.
Import model description and data:
from semopy import Model from semopy.examples import univariate_regression desc = univariate_regression.get_model() data = univariate_regression.get_data() print(desc)
Output:
y ~ x
Fit model to data:
mod = Model(desc) res_opt = mod.fit(data) estimates = mod.inspect()Inspecting optimization information by
print(res_opt)
: Name of objective: MLW Optimization method: SLSQP Optimization successful. Optimization terminated successfully Objective value: 0.000 Number of iterations: 34 Params: 4.983 0.214Printing parameter estimates by
print(estimates)
: lval op rval Estimate Std. Err z-value p-value y ~ x 4.982855 0.014062 354.358233 0.000000e+00 y ~~ y 0.213859 0.030244 7.071068 1.537437e-12
Simple multivariate linear regression model.
Import model description and data:
from semopy import Model from semopy.examples import multivariate_regression desc = multivariate_regression.get_model() data = multivariate_regression.get_data() print(desc)
Output:
y ~ x1 + x2 + x3
Fit model to data:
mod = Model(desc) res_opt = mod.fit(data) estimates = mod.inspect()Inspecting optimization information by
print(res_opt)
: Name of objective: MLW Optimization method: SLSQP Optimization successful. Optimization terminated successfully Objective value: 0.000 Number of iterations: 57 Params: 1.943 6.037 -9.770 1.022Printing parameter estimates by
print(estimates)
: lval op rval Estimate Std. Err z-value p-value y ~ x1 1.942883 0.057075 34.041036 0.000000e+00 y ~ x2 6.037293 0.016431 367.425350 0.000000e+00 y ~ x3 -9.769897 0.104971 -93.072035 0.000000e+00 y ~~ y 1.021754 0.144498 7.071068 1.537437e-12
A classic Hozlinger-Swineford dataset and CFA model.
Import model description and data:
from semopy import Model from semopy.examples import holzinger39 desc = holzinger39.get_model() data = holzinger39.get_data() print(desc)
Output:
visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9
Fit model to data:
mod = Model(desc) res_opt = mod.fit(data) estimates = mod.inspect()Inspecting optimization information by
print(res_opt)
: Name of objective: MLW Optimization method: SLSQP Optimization successful. Optimization terminated successfully Objective value: 0.283 Number of iterations: 28 Params: 0.554 0.731 1.113 0.926 1.180 1.083 1.133 0.800 0.566 0.446 0.356 0.844 0.371 0.488 0.550 0.808 0.262 0.408 0.383 0.174 0.980Printing parameter estimates by
print(estimates)
: lval op rval Estimate Std. Err z-value p-value x1 ~ visual 1.000000 - - - x2 ~ visual 0.554421 0.0997266 5.55941 2.70684e-08 x3 ~ visual 0.730526 0.10918 6.69101 2.21636e-11 x4 ~ textual 1.000000 - - - x5 ~ textual 1.113076 0.0653923 17.0215 0 x6 ~ textual 0.926120 0.0554248 16.7095 0 x7 ~ speed 1.000000 - - - x8 ~ speed 1.179980 0.165045 7.14946 8.71303e-13 x9 ~ speed 1.082517 0.151354 7.1522 8.53984e-13 visual ~~ visual 0.808310 0.145287 5.56355 2.64345e-08 visual ~~ speed 0.262135 0.0562525 4.65998 3.16245e-06 visual ~~ textual 0.408277 0.0735273 5.55273 2.81243e-08 speed ~~ speed 0.383377 0.0861705 4.44905 8.62528e-06 speed ~~ textual 0.173603 0.0493159 3.52022 0.000431185 textual ~~ textual 0.980034 0.112145 8.739 0 x2 ~~ x2 1.133391 0.101711 11.1432 0 x7 ~~ x7 0.799708 0.0813872 9.82597 0 x9 ~~ x9 0.565804 0.0707567 7.99648 1.33227e-15 x5 ~~ x5 0.446208 0.0583869 7.64226 2.13163e-14 x6 ~~ x6 0.356171 0.0430297 8.27733 2.22045e-16 x3 ~~ x3 0.843731 0.0906247 9.31016 0 x4 ~~ x4 0.371117 0.0477121 7.77826 7.32747e-15 x8 ~~ x8 0.487934 0.0741671 6.57886 4.74081e-11 x1 ~~ x1 0.550161 0.113439 4.84983 1.23567e-06
Bollen's Data on Industrialization and Political Democracy is a common benchmark amongst SEM tools.
Import model description and data:
from semopy import Model from semopy.examples import political_democracy desc = political_democracy.get_model() data = political_democracy.get_data() print(desc)
Output:
# measurement model ind60 =~ x1 + x2 + x3 dem60 =~ y1 + y2 + y3 + y4 dem65 =~ y5 + y6 + y7 + y8 # regressions dem60 ~ ind60 dem65 ~ ind60 + dem60 # residual correlations y1 ~~ y5 y2 ~~ y4 + y6 y3 ~~ y7 y4 ~~ y8 y6 ~~ y8
Fit model to data:
mod = Model(desc) res_opt = mod.fit(data) estimates = mod.inspect()Inspecting optimization information by
print(res_opt)
: Name of objective: MLW Optimization method: SLSQP Optimization successful. Optimization terminated successfully Objective value: 0.508 Number of iterations: 52 Params: 2.180 1.819 1.257 1.058 1.265 1.186 1.280 1.266 1.482 0.572 0.838 0.624 1.893 1.320 2.156 7.385 0.793 5.067 0.347 3.148 1.357 4.954 0.120 3.430 0.467 3.951 2.352 3.256 0.172 0.082 0.448Printing parameter estimates by
print(estimates)
: lval op rval Estimate Std. Err z-value p-value dem60 ~ ind60 1.482379 0.399024 3.71502 0.00020319 dem65 ~ ind60 0.571912 0.221383 2.58336 0.00978421 dem65 ~ dem60 0.837574 0.0984456 8.50799 0 x1 ~ ind60 1.000000 - - - x2 ~ ind60 2.180494 0.138565 15.7363 0 x3 ~ ind60 1.818546 0.151993 11.9646 0 y1 ~ dem60 1.000000 - - - y2 ~ dem60 1.256819 0.182687 6.87965 6.00009e-12 y3 ~ dem60 1.058174 0.151521 6.9837 2.87503e-12 y4 ~ dem60 1.265186 0.145151 8.71634 0 y5 ~ dem65 1.000000 - - - y6 ~ dem65 1.185743 0.168908 7.02003 2.21823e-12 y7 ~ dem65 1.279717 0.159996 7.99841 1.33227e-15 y8 ~ dem65 1.266084 0.158238 8.00114 1.33227e-15 dem60 ~~ dem60 3.950849 0.920451 4.2923 1.76835e-05 dem65 ~~ dem65 0.172210 0.214861 0.801494 0.422846 ind60 ~~ ind60 0.448321 0.0866766 5.17234 2.31175e-07 y1 ~~ y5 0.624423 0.358435 1.74208 0.0814939 y1 ~~ y1 1.892743 0.44456 4.25756 2.06666e-05 y2 ~~ y4 1.319589 0.70268 1.87794 0.0603898 y2 ~~ y6 2.156164 0.734155 2.93693 0.00331475 y2 ~~ y2 7.385292 1.37567 5.3685 7.93938e-08 y3 ~~ y7 0.793329 0.607642 1.30558 0.191694 y3 ~~ y3 5.066628 0.951722 5.32365 1.01708e-07 y4 ~~ y8 0.347222 0.442234 0.785154 0.432363 y4 ~~ y4 3.147911 0.738841 4.2606 2.03874e-05 y6 ~~ y8 1.357037 0.5685 2.38705 0.0169843 y6 ~~ y6 4.954364 0.914284 5.41884 5.9986e-08 x2 ~~ x2 0.119894 0.0697474 1.71897 0.0856192 y7 ~~ y7 3.430032 0.712732 4.81251 1.49045e-06 x3 ~~ x3 0.466732 0.0901676 5.17628 2.26359e-07 y5 ~~ y5 2.351910 0.480369 4.89604 9.77851e-07 y8 ~~ y8 3.256389 0.69504 4.68518 2.79711e-06 x1 ~~ x1 0.081573 0.0194949 4.18432 2.86025e-05
Complex synthetic model similar to the one from the publication.
Import model description and data:
from semopy import Model from semopy.examples import example_model desc = example_model.get_model() data = example_model.get_data()
Output:
# structural part eta3 ~ x1 + x2 eta4 ~ x3 x3 ~ eta1 + eta2 + x1 + x4 x4 ~ eta4 x5 ~ x4 # measurement part eta1 =~ y1 + y2 + y3 eta2 =~ y3 eta3 =~ y4 + y5 eta4 =~ y4 + y6 # additional covariances eta2 ~~ x2 y5 ~~ y6
Fit model to data:
mod = Model(desc) res_opt = mod.fit(data) estimates = mod.inspect()Inspecting optimization information by
print(res_opt)
: Name of objective: MLW Optimization method: SLSQP Optimization successful. Optimization terminated successfully Objective value: 0.095 Number of iterations: 337 Params: 1.909 1.756 -2.278 -24.734 -49.237 -0.548 0.095 1.281 -0.796 -2.268 2.562 2.139 -1.572 -0.003 0.253 -0.615 -0.914 10.579 1.630 24.496 4.906 8.431 4.510 1.126 6.466 5.415 15.192 4.773 1.471Printing parameter estimates by
print(estimates)
: lval op rval Estimate Std. Err z-value p-value eta3 ~ x1 1.908955 0.122055 15.6402 0 eta3 ~ x2 1.755519 0.117822 14.8998 0 eta4 ~ x3 -2.278310 0.0522005 -43.6453 0 x3 ~ eta1 -24.734394 895670 -2.76155e-05 0.999978 x3 ~ eta2 -49.237383 2815.81 -0.017486 0.986049 x3 ~ x1 -0.547537 0.173337 -3.1588 0.00158419 x3 ~ x4 0.094926 0.0645817 1.46986 0.1416 x4 ~ eta4 1.280893 0.015663 81.778 0 x5 ~ x4 -0.796110 0.00377853 -210.693 0 y1 ~ eta1 1.000000 - - - y2 ~ eta1 -2.268204 0.136494 -16.6176 0 y3 ~ eta1 2.561685 18190.8 0.000140823 0.999888 y3 ~ eta2 1.000000 - - - y4 ~ eta3 1.000000 - - - y4 ~ eta4 1.000000 - - - y5 ~ eta3 2.138635 0.0828205 25.8225 0 y6 ~ eta4 -1.571563 0.018802 -83.585 0 eta2 ~~ x2 -0.002732 0.157544 -0.0173428 0.986163 eta2 ~~ eta2 0.253475 22391.5 1.13202e-05 0.999991 eta2 ~~ eta1 -0.615460 26755 -2.30036e-05 0.999982 x3 ~~ x3 24.495979 565.492 0.043318 0.965448 x4 ~~ x4 4.905596 0.582112 8.42724 0 eta3 ~~ eta3 8.431479 0.759362 11.1034 0 eta4 ~~ eta4 15.191866 1.07907 14.0786 0 eta1 ~~ eta1 1.470793 0.160684 9.15333 0 y5 ~~ y6 -0.913621 0.966326 -0.945458 0.344425 y5 ~~ y5 10.578541 2.57705 4.10491 4.04476e-05 x5 ~~ x5 1.630430 0.103117 15.8114 0 y2 ~~ y2 4.509726 0.430146 10.4842 0 y1 ~~ y1 1.126047 0.0948097 11.8769 0 y6 ~~ y6 6.465540 0.866161 7.4646 8.34888e-14 y3 ~~ y3 5.414868 0.552187 9.80623 0 y4 ~~ y4 4.773085 0.676275 7.05791 1.6902e-12