semopy


Installation

semopy is freely available at the pypi repository. The most straightforward and universal way to install it is to run a command:

pip install semopy

Alternatively, you can download the package directly from it's git repository:

git clone https://gitlab.com/georgy.m/semopy

Quickstart

Let's take a look at a quick example of a typical semopy working session using built-in example.

First, let's get text description of the SEM model:

from semopy import Model
import pandas as pd
desc = semopy.examples.political_democracy.get_model()
print(desc)

Output:

# measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8

Let's get the associated dataset:

data = semopy.examples.political_democracy.get_data()
print(data.head())

Output:

      y1        y2        y3        y4  ...        y8        x1        x2        x3
1   2.50  0.000000  3.333333  0.000000  ...  3.333333  4.442651  3.637586  2.557615
2   1.25  0.000000  3.333333  0.000000  ...  0.736999  5.384495  5.062595  3.568079
3   7.50  8.800000  9.999998  9.199991  ...  8.211809  5.961005  6.255750  5.224433
4   8.90  8.800000  9.999998  9.199991  ...  4.615086  6.285998  7.567863  6.267495
5  10.00  3.333333  9.999998  6.666666  ...  6.666666  5.863631  6.818924  4.573679

[5 rows x 11 columns]

Now, we fit the model to the data and examine optimization results:

mod = Model(desc)
res = mod.fit(data)
print(res) 

Output:

Name of objective: MLW
Optimization method: SLSQP
Optimization successful.
Optimization terminated successfully
Objective value: 0.508
Number of iterations: 52
Params: 2.180 1.819 1.257 1.058 1.265 1.186 1.280 
1.266 1.482 0.572 0.838 0.624 1.893 1.320 2.156
7.385 0.793 5.067 0.347 3.148 1.357 4.954 0.082
0.172 0.120 3.256 0.467 3.951 3.430 2.352 0.448

Finally, let's inspect parameters estimates:

ins = mod.inspect()
print(ins)

Output:

     lval  op   rval  Estimate   Std. Err   z-value      p-value
0   dem60   ~  ind60  1.482381   0.399018   3.71508  0.000203142
1   dem65   ~  ind60  0.571913    0.22138    2.5834   0.00978329
2   dem65   ~  dem60  0.837576  0.0984455   8.50802            0
3      x1   ~  ind60  1.000000          -         -            -
4      x2   ~  ind60  2.180490   0.138538   15.7392            0
5      x3   ~  ind60  1.818548   0.151979   11.9658            0
6      y1   ~  dem60  1.000000          -         -            -
7      y2   ~  dem60  1.256818   0.182686   6.87966  5.99965e-12
8      y3   ~  dem60  1.058173    0.15152   6.98371  2.87481e-12
9      y4   ~  dem60  1.265187   0.145151   8.71636            0
10     y5   ~  dem65  1.000000          -         -            -
11     y6   ~  dem65  1.185744   0.168908   7.02007  2.21756e-12
12     y7   ~  dem65  1.279717   0.159996   7.99845  1.33227e-15
13     y8   ~  dem65  1.266083   0.158237   8.00118  1.33227e-15
14  dem65  ~~  dem65  0.172209   0.214861  0.801488     0.422849
15  dem60  ~~  dem60  3.950850    0.92045    4.2923  1.76829e-05
16  ind60  ~~  ind60  0.448325  0.0866717   5.17268  2.30759e-07
17     y1  ~~     y5  0.624422   0.358434   1.74208     0.081494
18     y1  ~~     y1  1.892742   0.444559   4.25757  2.06663e-05
19     y2  ~~     y4  1.319584   0.702679   1.87793    0.0603906
20     y2  ~~     y6  2.156162   0.734155   2.93693   0.00331478
21     y2  ~~     y2  7.385292    1.37567    5.3685  7.93927e-08
22     y3  ~~     y7  0.793330   0.607642   1.30559     0.191693
23     y3  ~~     y3  5.066628   0.951721   5.32365  1.01706e-07
24     y4  ~~     y8  0.347221   0.442234  0.785153     0.432364
25     y4  ~~     y4  3.147914   0.738841   4.26061  2.03871e-05
26     y6  ~~     y8  1.357036     0.5685   2.38705    0.0169844
27     y6  ~~     y6  4.954365   0.914285   5.41884  5.99863e-08
28     x1  ~~     x1  0.081536  0.0194887   4.18376  2.86733e-05
29     x2  ~~     x2  0.119879  0.0697343   1.71909    0.0855983
30     y8  ~~     y8  3.256387   0.695039   4.68518  2.79708e-06
31     x3  ~~     x3  0.466730  0.0901626   5.17654  2.26045e-07
32     y7  ~~     y7  3.430032   0.712732   4.81251  1.49045e-06
33     y5  ~~     y5  2.351909   0.480369   4.89604  9.77848e-07

Model class

The cornerstone of semopy is Model class and its children. Model instances can be thought of as sklearn models, as the working pipeline might look similar:
  1. Instantiate Model, set it up using SEM model description in semopy syntax;
  2. Invoke fit method of the Model instance for a given data;
  3. Invoke inspect method of the Model to analyze parameter estimates;

Assume that we have SEM model description in semopy syntax in the string variable desc and data in the pandas DataFrame variable data. Then, an ordinary semopy session looks like this:

from semopy import Model
model = Model(desc)
opt_res = model.fit(data)
estimates = model.inspect()
fit method has 3 arguments of interest:
  1. data — dataset in the form of pandas DataFrame;
  2. obj — name of objective function to minimize:
    1. "MLW" (the default): Wishart loglikelihood;
    2. "ULS": Unweighted Least Squares;
    3. "GLS": Generalized Least Squares;
    4. "FIML": Full Information Maximum Likelihood (when data has no missing values FIML is effectively a Multivariate Normal Maximum Likelihood).
  3. solver — name of optimization method (at the moment only scipy-minimize methods are available, the default is "SLSQP").

Method fit returns a special structure that contains useful information on the optimization process.

Displaying results

inspect methods returns pandas DataFrame with parameter estimates and p-values. DataFrame will contain estimates, standard errors, z-scores and p-values (see Built-in examples). This method, however, is more versatile than it may look at the first glance; it has 3 arguments of interest:
  1. mode — dictates the behaviour of the method. Can take 2 values:
    1. "list" (the default): DataFrame with estimates is returned;
    2. "mx": Dictionary with model iternal structures (matrices) is returned.
  2. what — has effect only if mode is "mx"; determines what values are displayed in place of estimated parameters in matrices. Can take 3 values:
    1. "est" (the default): matrices are returned as-is with current parameter estimates;
    2. "start": matrices are returned filled with the starting values of parameters.
    3. "names": instead of values, parameter names/identifiers are displayed in place of their respective parameters.
  3. std_est -- if True or "lv", standardized coefficients are also returned (non-output variables are not standardized in case of "lv"). The default is False.

Built-in examples

semopy has numerous built-in model examples for testing purposes.

Univariate regression

Simple univariate linear regression model.

Import model description and data:

from semopy import Model
from semopy.examples import univariate_regression

desc = univariate_regression.get_model()
data = univariate_regression.get_data()
print(desc)

Output:

y ~ x

Fit model to data:

mod = Model(desc)
res_opt = mod.fit(data)
estimates = mod.inspect()
Inspecting optimization information by print(res_opt):
Name of objective: MLW
Optimization method: SLSQP
Optimization successful.
Optimization terminated successfully
Objective value: 0.000
Number of iterations: 34
Params: 4.983 0.214
Printing parameter estimates by print(estimates):
lval  op rval  Estimate  Std. Err     z-value       p-value
   y   ~    x  4.982855  0.014062  354.358233  0.000000e+00
   y  ~~    y  0.213859  0.030244    7.071068  1.537437e-12

Multivariate regression

Simple multivariate linear regression model.

Import model description and data:

from semopy import Model
from semopy.examples import multivariate_regression

desc = multivariate_regression.get_model()
data = multivariate_regression.get_data()
print(desc)

Output:

y ~ x1 + x2 + x3

Fit model to data:

mod = Model(desc)
res_opt = mod.fit(data)
estimates = mod.inspect()
Inspecting optimization information by print(res_opt):
Name of objective: MLW
Optimization method: SLSQP
Optimization successful.
Optimization terminated successfully
Objective value: 0.000
Number of iterations: 57
Params: 1.943 6.037 -9.770 1.022
Printing parameter estimates by print(estimates):
lval  op rval  Estimate  Std. Err     z-value       p-value
   y   ~   x1  1.942883  0.057075   34.041036  0.000000e+00
   y   ~   x2  6.037293  0.016431  367.425350  0.000000e+00
   y   ~   x3 -9.769897  0.104971  -93.072035  0.000000e+00
   y  ~~    y  1.021754  0.144498    7.071068  1.537437e-12

Holzinger-Swineford 1939

A classic Hozlinger-Swineford dataset and CFA model.

Import model description and data:

from semopy import Model
from semopy.examples import holzinger39

desc = holzinger39.get_model()
data = holzinger39.get_data()
print(desc)

Output:

visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9

Fit model to data:

mod = Model(desc)
res_opt = mod.fit(data)
estimates = mod.inspect()
Inspecting optimization information by print(res_opt):
Name of objective: MLW
Optimization method: SLSQP
Optimization successful.
Optimization terminated successfully
Objective value: 0.283
Number of iterations: 28
Params: 0.554 0.731 1.113 0.926 1.180 1.083 1.133 0.800 0.566 0.446 0.356 0.844 0.371 0.488 0.550 0.808 0.262 0.408 0.383 0.174 0.980
Printing parameter estimates by print(estimates):
    lval  op     rval  Estimate   Std. Err  z-value      p-value
      x1   ~   visual  1.000000          -        -            -
      x2   ~   visual  0.554421  0.0997266  5.55941  2.70684e-08
      x3   ~   visual  0.730526    0.10918  6.69101  2.21636e-11
      x4   ~  textual  1.000000          -        -            -
      x5   ~  textual  1.113076  0.0653923  17.0215            0
      x6   ~  textual  0.926120  0.0554248  16.7095            0
      x7   ~    speed  1.000000          -        -            -
      x8   ~    speed  1.179980   0.165045  7.14946  8.71303e-13
      x9   ~    speed  1.082517   0.151354   7.1522  8.53984e-13
  visual  ~~   visual  0.808310   0.145287  5.56355  2.64345e-08
  visual  ~~    speed  0.262135  0.0562525  4.65998  3.16245e-06
  visual  ~~  textual  0.408277  0.0735273  5.55273  2.81243e-08
   speed  ~~    speed  0.383377  0.0861705  4.44905  8.62528e-06
   speed  ~~  textual  0.173603  0.0493159  3.52022  0.000431185
 textual  ~~  textual  0.980034   0.112145    8.739            0
      x2  ~~       x2  1.133391   0.101711  11.1432            0
      x7  ~~       x7  0.799708  0.0813872  9.82597            0
      x9  ~~       x9  0.565804  0.0707567  7.99648  1.33227e-15
      x5  ~~       x5  0.446208  0.0583869  7.64226  2.13163e-14
      x6  ~~       x6  0.356171  0.0430297  8.27733  2.22045e-16
      x3  ~~       x3  0.843731  0.0906247  9.31016            0
      x4  ~~       x4  0.371117  0.0477121  7.77826  7.32747e-15
      x8  ~~       x8  0.487934  0.0741671  6.57886  4.74081e-11
      x1  ~~       x1  0.550161   0.113439  4.84983  1.23567e-06

Political Democracy

Bollen's Data on Industrialization and Political Democracy is a common benchmark amongst SEM tools.

Import model description and data:

from semopy import Model
from semopy.examples import political_democracy

desc = political_democracy.get_model()
data = political_democracy.get_data()
print(desc)

Output:

# measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8

Fit model to data:

mod = Model(desc)
res_opt = mod.fit(data)
estimates = mod.inspect()
Inspecting optimization information by print(res_opt):
Name of objective: MLW
Optimization method: SLSQP
Optimization successful.
Optimization terminated successfully
Objective value: 0.508
Number of iterations: 52
Params: 2.180 1.819 1.257 1.058 1.265 1.186 1.280 1.266 1.482 0.572 0.838 0.624 1.893 1.320 2.156 7.385 0.793 5.067 0.347 3.148 1.357 4.954 0.120 3.430 0.467 3.951 2.352 3.256 0.172 0.082 0.448
Printing parameter estimates by print(estimates):
  lval  op   rval  Estimate   Std. Err   z-value      p-value
 dem60   ~  ind60  1.482379   0.399024   3.71502   0.00020319
 dem65   ~  ind60  0.571912   0.221383   2.58336   0.00978421
 dem65   ~  dem60  0.837574  0.0984456   8.50799            0
    x1   ~  ind60  1.000000          -         -            -
    x2   ~  ind60  2.180494   0.138565   15.7363            0
    x3   ~  ind60  1.818546   0.151993   11.9646            0
    y1   ~  dem60  1.000000          -         -            -
    y2   ~  dem60  1.256819   0.182687   6.87965  6.00009e-12
    y3   ~  dem60  1.058174   0.151521    6.9837  2.87503e-12
    y4   ~  dem60  1.265186   0.145151   8.71634            0
    y5   ~  dem65  1.000000          -         -            -
    y6   ~  dem65  1.185743   0.168908   7.02003  2.21823e-12
    y7   ~  dem65  1.279717   0.159996   7.99841  1.33227e-15
    y8   ~  dem65  1.266084   0.158238   8.00114  1.33227e-15
 dem60  ~~  dem60  3.950849   0.920451    4.2923  1.76835e-05
 dem65  ~~  dem65  0.172210   0.214861  0.801494     0.422846
 ind60  ~~  ind60  0.448321  0.0866766   5.17234  2.31175e-07
    y1  ~~     y5  0.624423   0.358435   1.74208    0.0814939
    y1  ~~     y1  1.892743    0.44456   4.25756  2.06666e-05
    y2  ~~     y4  1.319589    0.70268   1.87794    0.0603898
    y2  ~~     y6  2.156164   0.734155   2.93693   0.00331475
    y2  ~~     y2  7.385292    1.37567    5.3685  7.93938e-08
    y3  ~~     y7  0.793329   0.607642   1.30558     0.191694
    y3  ~~     y3  5.066628   0.951722   5.32365  1.01708e-07
    y4  ~~     y8  0.347222   0.442234  0.785154     0.432363
    y4  ~~     y4  3.147911   0.738841    4.2606  2.03874e-05
    y6  ~~     y8  1.357037     0.5685   2.38705    0.0169843
    y6  ~~     y6  4.954364   0.914284   5.41884   5.9986e-08
    x2  ~~     x2  0.119894  0.0697474   1.71897    0.0856192
    y7  ~~     y7  3.430032   0.712732   4.81251  1.49045e-06
    x3  ~~     x3  0.466732  0.0901676   5.17628  2.26359e-07
    y5  ~~     y5  2.351910   0.480369   4.89604  9.77851e-07
    y8  ~~     y8  3.256389    0.69504   4.68518  2.79711e-06
    x1  ~~     x1  0.081573  0.0194949   4.18432  2.86025e-05

SEM example model

Complex synthetic model similar to the one from the publication.

Import model description and data:

from semopy import Model
from semopy.examples import example_model

desc = example_model.get_model()
data = example_model.get_data()

Output:

 # structural part
eta3 ~ x1 + x2
eta4 ~ x3 
x3 ~ eta1 + eta2 + x1 + x4 
x4 ~ eta4 
x5 ~ x4 
# measurement part 
eta1 =~ y1 + y2 + y3 
eta2 =~ y3 
eta3 =~ y4 + y5 
eta4 =~ y4 + y6 
# additional covariances
eta2 ~~   x2 
y5 ~~   y6 

Fit model to data:

mod = Model(desc)
res_opt = mod.fit(data)
estimates = mod.inspect()
Inspecting optimization information by print(res_opt):
Name of objective: MLW
Optimization method: SLSQP
Optimization successful.
Optimization terminated successfully
Objective value: 0.095
Number of iterations: 337
Params: 1.909 1.756 -2.278 -24.734 -49.237 -0.548 0.095 1.281 -0.796 -2.268 2.562 2.139 -1.572 -0.003 0.253 -0.615 -0.914 10.579 1.630 24.496 4.906 8.431 4.510 1.126 6.466 5.415 15.192 4.773 1.471
Printing parameter estimates by print(estimates):
 lval  op  rval   Estimate    Std. Err      z-value      p-value
 eta3   ~    x1   1.908955    0.122055      15.6402            0
 eta3   ~    x2   1.755519    0.117822      14.8998            0
 eta4   ~    x3  -2.278310   0.0522005     -43.6453            0
   x3   ~  eta1 -24.734394      895670 -2.76155e-05     0.999978
   x3   ~  eta2 -49.237383     2815.81    -0.017486     0.986049
   x3   ~    x1  -0.547537    0.173337      -3.1588   0.00158419
   x3   ~    x4   0.094926   0.0645817      1.46986       0.1416
   x4   ~  eta4   1.280893    0.015663       81.778            0
   x5   ~    x4  -0.796110  0.00377853     -210.693            0
   y1   ~  eta1   1.000000           -            -            -
   y2   ~  eta1  -2.268204    0.136494     -16.6176            0
   y3   ~  eta1   2.561685     18190.8  0.000140823     0.999888
   y3   ~  eta2   1.000000           -            -            -
   y4   ~  eta3   1.000000           -            -            -
   y4   ~  eta4   1.000000           -            -            -
   y5   ~  eta3   2.138635   0.0828205      25.8225            0
   y6   ~  eta4  -1.571563    0.018802      -83.585            0
 eta2  ~~    x2  -0.002732    0.157544   -0.0173428     0.986163
 eta2  ~~  eta2   0.253475     22391.5  1.13202e-05     0.999991
 eta2  ~~  eta1  -0.615460       26755 -2.30036e-05     0.999982
   x3  ~~    x3  24.495979     565.492     0.043318     0.965448
   x4  ~~    x4   4.905596    0.582112      8.42724            0
 eta3  ~~  eta3   8.431479    0.759362      11.1034            0
 eta4  ~~  eta4  15.191866     1.07907      14.0786            0
 eta1  ~~  eta1   1.470793    0.160684      9.15333            0
   y5  ~~    y6  -0.913621    0.966326    -0.945458     0.344425
   y5  ~~    y5  10.578541     2.57705      4.10491  4.04476e-05
   x5  ~~    x5   1.630430    0.103117      15.8114            0
   y2  ~~    y2   4.509726    0.430146      10.4842            0
   y1  ~~    y1   1.126047   0.0948097      11.8769            0
   y6  ~~    y6   6.465540    0.866161       7.4646  8.34888e-14
   y3  ~~    y3   5.414868    0.552187      9.80623            0
   y4  ~~    y4   4.773085    0.676275      7.05791   1.6902e-12