semopy


Installation

semopy is freely available at the pypi repository. The most straightforward and universal way to install it is to run a command:

pip install semopy

Alternatively, you can download the package directly from its git repository:

git clone https://gitlab.com/georgy.m/semopy

Quickstart

Let's take a look at a quick example of a typical semopy working session using built-in example.

First, let's get text description of the SEM model:

import semopy
import pandas as pd
desc = semopy.examples.political_democracy.get_model()
print(desc)

Output:

# measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8

Let's get the associated dataset:

data = semopy.examples.political_democracy.get_data()
print(data.head())

Output:

      y1        y2        y3        y4  ...        y8        x1        x2        x3
1   2.50  0.000000  3.333333  0.000000  ...  3.333333  4.442651  3.637586  2.557615
2   1.25  0.000000  3.333333  0.000000  ...  0.736999  5.384495  5.062595  3.568079
3   7.50  8.800000  9.999998  9.199991  ...  8.211809  5.961005  6.255750  5.224433
4   8.90  8.800000  9.999998  9.199991  ...  4.615086  6.285998  7.567863  6.267495
5  10.00  3.333333  9.999998  6.666666  ...  6.666666  5.863631  6.818924  4.573679

[5 rows x 11 columns]

Now, we fit the model to the data and examine optimization results:

mod = semopy.Model(desc)
res = mod.fit(data)
print(res)

Output:

Name of objective: MLW
Optimization method: SLSQP
Optimization successful.
Optimization terminated successfully
Objective value: 0.508
Number of iterations: 52
Params: 2.180 1.819 1.257 1.058 1.265 1.186 1.280 
1.266 1.482 0.572 0.838 0.624 1.893 1.320 2.156
7.385 0.793 5.067 0.347 3.148 1.357 4.954 0.082
0.172 0.120 3.256 0.467 3.951 3.430 2.352 0.448

Finally, let's inspect parameters estimates:

ins = mod.inspect()
print(ins)

Output:

     lval  op   rval  Estimate   Std. Err   z-value      p-value
0   dem60   ~  ind60  1.482381   0.399018   3.71508  0.000203142
1   dem65   ~  ind60  0.571913    0.22138    2.5834   0.00978329
2   dem65   ~  dem60  0.837576  0.0984455   8.50802            0
3      x1   ~  ind60  1.000000          -         -            -
4      x2   ~  ind60  2.180490   0.138538   15.7392            0
5      x3   ~  ind60  1.818548   0.151979   11.9658            0
6      y1   ~  dem60  1.000000          -         -            -
7      y2   ~  dem60  1.256818   0.182686   6.87966  5.99965e-12
8      y3   ~  dem60  1.058173    0.15152   6.98371  2.87481e-12
9      y4   ~  dem60  1.265187   0.145151   8.71636            0
10     y5   ~  dem65  1.000000          -         -            -
11     y6   ~  dem65  1.185744   0.168908   7.02007  2.21756e-12
12     y7   ~  dem65  1.279717   0.159996   7.99845  1.33227e-15
13     y8   ~  dem65  1.266083   0.158237   8.00118  1.33227e-15
14  dem65  ~~  dem65  0.172209   0.214861  0.801488     0.422849
15  dem60  ~~  dem60  3.950850    0.92045    4.2923  1.76829e-05
16  ind60  ~~  ind60  0.448325  0.0866717   5.17268  2.30759e-07
17     y1  ~~     y5  0.624422   0.358434   1.74208     0.081494
18     y1  ~~     y1  1.892742   0.444559   4.25757  2.06663e-05
19     y2  ~~     y4  1.319584   0.702679   1.87793    0.0603906
20     y2  ~~     y6  2.156162   0.734155   2.93693   0.00331478
21     y2  ~~     y2  7.385292    1.37567    5.3685  7.93927e-08
22     y3  ~~     y7  0.793330   0.607642   1.30559     0.191693
23     y3  ~~     y3  5.066628   0.951721   5.32365  1.01706e-07
24     y4  ~~     y8  0.347221   0.442234  0.785153     0.432364
25     y4  ~~     y4  3.147914   0.738841   4.26061  2.03871e-05
26     y6  ~~     y8  1.357036     0.5685   2.38705    0.0169844
27     y6  ~~     y6  4.954365   0.914285   5.41884  5.99863e-08
28     x1  ~~     x1  0.081536  0.0194887   4.18376  2.86733e-05
29     x2  ~~     x2  0.119879  0.0697343   1.71909    0.0855983
30     y8  ~~     y8  3.256387   0.695039   4.68518  2.79708e-06
31     x3  ~~     x3  0.466730  0.0901626   5.17654  2.26045e-07
32     y7  ~~     y7  3.430032   0.712732   4.81251  1.49045e-06
33     y5  ~~     y5  2.351909   0.480369   4.89604  9.77848e-07

Model class

The cornerstone of semopy is Model class and its children. Model instances can be thought of as sklearn models, as the working pipeline might look similar:
  1. Instantiate Model, set it up using SEM model description in semopy syntax;
  2. Invoke fit method of the Model instance for a given data;
  3. Invoke inspect method of the Model to analyze parameter estimates;

Assume that we have a SEM model description in semopy syntax in the string variable desc and a data in the pandas DataFrame variable data. Then, an ordinary semopy session looks like this:

from semopy import Model
model = Model(desc)
opt_res = model.fit(data)
estimates = model.inspect()
fit method has 3 arguments of interest:
  1. data — dataset in the form of pandas DataFrame;
  2. obj — name of objective function to minimize:
    1. "MLW" (the default): Wishart loglikelihood;
    2. "ULS": Unweighted Least Squares;
    3. "GLS": Generalized Least Squares;
    4. "WLS": Weighted Least Squares (also known as an Asymptotic Distribution-Free Estimator);
    5. "DWLS": Diagonally Weighted Least Squares (also known as robust WLS);
    6. "FIML": Full Information Maximum Likelihood (when data has no missing values FIML is effectively a Multivariate Normal Maximum Likelihood).
  3. solver — name of optimization method (at the moment only scipy-minimize methods are available, the default is "SLSQP").

Method fit returns a special structure that contains useful information on the optimization process.

Displaying results

inspect methods returns pandas DataFrame with parameter estimates and p-values. DataFrame will contain estimates, standard errors, z-scores and p-values (see Built-in examples). This method, however, is more versatile than it may look at the first glance; it has 3 arguments of interest:
  1. mode — dictates the behaviour of the method. Can take 2 values:
    1. "list" (the default): DataFrame with estimates is returned;
    2. "mx": Dictionary with model iternal structures (matrices) is returned.
  2. what — has effect only if mode is "mx"; determines what values are displayed in place of estimated parameters in matrices. Can take 3 values:
    1. "est" (the default): matrices are returned as-is with current parameter estimates;
    2. "start": matrices are returned filled with the starting values of parameters.
    3. "names": instead of values, parameter names/identifiers are displayed in place of their respective parameters.
  3. std_est -- if True or "lv", standardized coefficients are also returned (non-output variables are not standardized in case of "lv"). The default is False.

Built-in examples

semopy has numerous built-in model examples for testing purposes.

Univariate regression

Simple univariate linear regression model.

Import model description and data:

from semopy import Model
from semopy.examples import univariate_regression

desc = univariate_regression.get_model()
data = univariate_regression.get_data()
print(desc)

Output:

y ~ x

Fit model to data:

mod = Model(desc)
res_opt = mod.fit(data)
estimates = mod.inspect()
Inspecting optimization information by print(res_opt):
Name of objective: MLW
Optimization method: SLSQP
Optimization successful.
Optimization terminated successfully
Objective value: 0.000
Number of iterations: 11
Params: -1.221 0.670
Printing parameter estimates by print(estimates):
  lval  op rval  Estimate  Std. Err    z-value       p-value
0    y   ~    x -1.221069  0.083165 -14.682538  0.000000e+00
1    y  ~~    y  0.670367  0.094804   7.071068  1.537437e-12

Univariate regression with multiple regressors

Same as univariate linear regression model, but with multiple independent variables.

Import model description and data:

from semopy import Model
from semopy.examples import univariate_regression_many

desc = univariate_regression_many.get_model()
data = univariate_regression_many.get_data()
print(desc)

Output:

y ~ x1 + x2 + x3

Fit model to data:

mod = Model(desc)
res_opt = mod.fit(data)
estimates = mod.inspect()
Inspecting optimization information by print(res_opt):
Name of objective: MLW
Optimization method: SLSQP
Optimization successful.
Optimization terminated successfully
Objective value: 0.000
Number of iterations: 13
Params: 1.400 0.451 1.190 0.878
Printing parameter estimates by print(estimates):
  lval  op rval  Estimate  Std. Err    z-value       p-value
0    y   ~   x1  1.399551  0.091138  15.356385  0.000000e+00
1    y   ~   x2  0.450561  0.097883   4.603051  4.163465e-06
2    y   ~   x3  1.190470  0.086499  13.762839  0.000000e+00
3    y  ~~    y  0.878486  0.124237   7.071068  1.537437e-12

Multivariate regression

Multivariate linear regression model.

Import model description and data:

from semopy import Model
from semopy.examples import multivariate_regression

desc = multivariate_regression.get_model()
data = multivariate_regression.get_data()
print(desc)

Output:

y1, y2, y3 ~ x1 + x2 + x3

Fit model to data:

mod = Model(desc)
res_opt = mod.fit(data)
estimates = mod.inspect()
Inspecting optimization information by print(res_opt):
Name of objective: MLW
Optimization method: SLSQP
Optimization successful.
Optimization terminated successfully
Objective value: 0.068
Number of iterations: 17
Params: -1.390 -1.138 -0.318 -0.746 1.074 -1.131 0.703 1.235 -0.920 1.136 0.489 0.638
Printing parameter estimates by print(estimates):
   lval  op rval  Estimate  Std. Err    z-value       p-value
0    y1   ~   x1 -1.389754  0.073417 -18.929470  0.000000e+00
1    y1   ~   x2 -1.138405  0.087966 -12.941462  0.000000e+00
2    y1   ~   x3 -0.317893  0.072576  -4.380132  1.186073e-05
3    y2   ~   x1 -0.745837  0.097974  -7.612623  2.686740e-14
4    y2   ~   x2  1.074436  0.117388   9.152855  0.000000e+00
5    y2   ~   x3 -1.130890  0.096851 -11.676597  0.000000e+00
6    y3   ~   x1  0.702778  0.064270  10.934755  0.000000e+00
7    y3   ~   x2  1.235044  0.077006  16.038334  0.000000e+00
8    y3   ~   x3 -0.920469  0.063534 -14.487925  0.000000e+00
9    y2  ~~   y2  1.135729  0.160616   7.071068  1.537437e-12
10   y3  ~~   y3  0.488735  0.069118   7.071068  1.537437e-12
11   y1  ~~   y1  0.637755  0.090192   7.071068  1.537437e-12

Holzinger-Swineford 1939

A classic Hozlinger-Swineford dataset and CFA model.

Import model description and data:

from semopy import Model
from semopy.examples import holzinger39

desc = holzinger39.get_model()
data = holzinger39.get_data()
print(desc)

Output:

visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9

Fit model to data:

mod = Model(desc)
res_opt = mod.fit(data)
estimates = mod.inspect()
Inspecting optimization information by print(res_opt):
Name of objective: MLW
Optimization method: SLSQP
Optimization successful.
Optimization terminated successfully
Objective value: 0.283
Number of iterations: 28
Params: 0.554 0.731 1.113 0.926 1.180 1.083 0.550 1.133 0.356 0.488 0.844 0.371 0.800 0.566 0.446 0.980 0.408 0.174 0.808 0.262 0.383
Printing parameter estimates by print(estimates):
       lval  op     rval  Estimate  Std. Err    z-value   p-value
0        x1   ~   visual  1.000000         -          -         -
1        x2   ~   visual  0.554421  0.099727   5.559413       0.0
2        x3   ~   visual  0.730526   0.10918   6.691009       0.0
3        x4   ~  textual  1.000000         -          -         -
4        x5   ~  textual  1.113076  0.065392  17.021522       0.0
5        x6   ~  textual  0.926120  0.055425  16.709493       0.0
6        x7   ~    speed  1.000000         -          -         -
7        x8   ~    speed  1.179980  0.165045   7.149459       0.0
8        x9   ~    speed  1.082517  0.151354   7.152197       0.0
9   textual  ~~  textual  0.980034  0.112145   8.739002       0.0
10  textual  ~~   visual  0.408277  0.073527    5.55273       0.0
11  textual  ~~    speed  0.173603  0.049316   3.520223  0.000431
12   visual  ~~   visual  0.808310  0.145287   5.563548       0.0
13   visual  ~~    speed  0.262135  0.056252   4.659977  0.000003
14    speed  ~~    speed  0.383377  0.086171   4.449045  0.000009
15       x1  ~~       x1  0.550161  0.113439    4.84983  0.000001
16       x2  ~~       x2  1.133391  0.101711  11.143202       0.0
17       x6  ~~       x6  0.356171   0.04303   8.277334       0.0
18       x8  ~~       x8  0.487934  0.074167   6.578856       0.0
19       x3  ~~       x3  0.843731  0.090625    9.31016       0.0
20       x4  ~~       x4  0.371117  0.047712   7.778264       0.0
21       x7  ~~       x7  0.799708  0.081387   9.825966       0.0
22       x9  ~~       x9  0.565804  0.070757   7.996483       0.0
23       x5  ~~       x5  0.446208  0.058387   7.642264       0.0

Political Democracy

Bollen's Data on Industrialization and Political Democracy is a common benchmark amongst SEM tools.

Import model description and data:

from semopy import Model
from semopy.examples import political_democracy

desc = political_democracy.get_model()
data = political_democracy.get_data()
print(desc)

Output:

# measurement model
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8

Fit model to data:

mod = Model(desc)
res_opt = mod.fit(data)
estimates = mod.inspect()
Inspecting optimization information by print(res_opt):
Name of objective: MLW
Optimization method: SLSQP
Optimization successful.
Optimization terminated successfully
Objective value: 0.508
Number of iterations: 52
Params: 2.180 1.819 1.257 1.058 1.265 1.186 1.280 1.266 1.482 0.572 0.838 0.624 1.893 1.320 2.156 7.385 0.793 5.067 0.347 3.148 1.357 4.954 0.082 3.256 0.172 3.430 0.120 3.951 0.467 2.352 0.448
Printing parameter estimates by print(estimates):
     lval  op   rval  Estimate  Std. Err    z-value   p-value
0   dem60   ~  ind60  1.482379  0.399024   3.715017  0.000203
1   dem65   ~  ind60  0.571912  0.221383   2.583364  0.009784
2   dem65   ~  dem60  0.837574  0.098446   8.507992       0.0
3      x1   ~  ind60  1.000000         -          -         -
4      x2   ~  ind60  2.180494  0.138565  15.736254       0.0
5      x3   ~  ind60  1.818546  0.151993   11.96465       0.0
6      y1   ~  dem60  1.000000         -          -         -
7      y2   ~  dem60  1.256819  0.182687   6.879647       0.0
8      y3   ~  dem60  1.058174  0.151521   6.983699       0.0
9      y4   ~  dem60  1.265186  0.145151   8.716344       0.0
10     y5   ~  dem65  1.000000         -          -         -
11     y6   ~  dem65  1.185743  0.168908   7.020032       0.0
12     y7   ~  dem65  1.279717  0.159996    7.99841       0.0
13     y8   ~  dem65  1.266084  0.158238   8.001141       0.0
14  dem65  ~~  dem65  0.172210  0.214861   0.801494  0.422846
15  dem60  ~~  dem60  3.950849  0.920451   4.292296  0.000018
16  ind60  ~~  ind60  0.448321  0.086677   5.172345       0.0
17     y1  ~~     y5  0.624423  0.358435   1.742083  0.081494
18     y1  ~~     y1  1.892743   0.44456   4.257565  0.000021
19     y2  ~~     y4  1.319589   0.70268   1.877937   0.06039
20     y2  ~~     y6  2.156164  0.734155   2.936934  0.003315
21     y2  ~~     y2  7.385292  1.375671   5.368501       0.0
22     y3  ~~     y7  0.793329  0.607642   1.305585  0.191694
23     y3  ~~     y3  5.066628  0.951722   5.323646       0.0
24     y4  ~~     y8  0.347222  0.442234   0.785154  0.432363
25     y4  ~~     y4  3.147911  0.738841   4.260605   0.00002
26     y6  ~~     y8  1.357037    0.5685   2.387047  0.016984
27     y6  ~~     y6  4.954364  0.914284   5.418843       0.0
28     x1  ~~     x1  0.081573  0.019495   4.184317  0.000029
29     y8  ~~     y8  3.256389   0.69504   4.685182  0.000003
30     y7  ~~     y7  3.430032  0.712732   4.812512  0.000001
31     x2  ~~     x2  0.119894  0.069747   1.718973  0.085619
32     x3  ~~     x3  0.466732  0.090168   5.176276       0.0
33     y5  ~~     y5  2.351910  0.480369   4.896044  0.000001

SEM example model

Complex synthetic model that is used in the upcoming publication.

Import model description and data:

from semopy import Model
from semopy.examples import example_model

desc = example_model.get_model()
data = example_model.get_data()

Output:

# Measurement part
eta1 =~ y1 + y2 + y3
eta2 =~ y3 + y2
eta3 =~ y4 + y5
eta4 =~ y4 + y6
# Structural part
eta3 ~ x2 + x1
eta4 ~ x3
x3 ~ eta1 + eta2 + x1
x4 ~ eta4 + x6
y7 ~ x4 + x6
# Additional covariances
y6 ~~ y5
x2 ~~ eta2

Fit model to data:

mod = Model(desc)
res_opt = mod.fit(data)
estimates = mod.inspect()
Inspecting optimization information by print(res_opt):
Name of objective: MLW
Optimization method: SLSQP
Optimization successful.
Optimization terminated successfully
Objective value: 0.091
Number of iterations: 58
Params: -0.488 -0.782 -0.183 1.225 1.444 -1.147 -1.344 1.223 1.071 -0.348 1.291 1.454 0.840 -0.388 -0.625 -0.106 1.252 -0.084 1.010 1.097 0.654 0.844 0.804 0.870 1.114 0.871 0.824 0.696 1.182 -0.499 1.264
Printing parameter estimates by print(estimates):
    lval  op  rval  Estimate        Std. Err    z-value   p-value
0   eta3   ~    x2 -1.146663        0.065317  -17.55527       0.0
1   eta3   ~    x1 -1.344422        0.076917 -17.478884       0.0
2   eta4   ~    x3  1.222542        0.038071  32.112318       0.0
3     x3   ~  eta1  1.070822  1261846.903958   0.000001  0.999999
4     x3   ~  eta2 -0.347555        0.146593  -2.370895  0.017745
5     x3   ~    x1  1.291230        0.075725  17.051592       0.0
6     x4   ~  eta4  1.454421        0.041067   35.41557       0.0
7     x4   ~    x6  0.839923         0.06817  12.320923       0.0
8     y1   ~  eta1  1.000000               -          -         -
9     y2   ~  eta1 -0.488414  2839143.268003       -0.0       1.0
10    y2   ~  eta2 -0.781996        0.912859  -0.856646  0.391641
11    y3   ~  eta1 -0.182725  3630634.464816       -0.0       1.0
12    y3   ~  eta2  1.000000               -          -         -
13    y4   ~  eta3  1.000000               -          -         -
14    y4   ~  eta4  1.000000               -          -         -
15    y5   ~  eta3  1.224550        0.048392  25.304791       0.0
16    y6   ~  eta4  1.443567        0.040942  35.258544       0.0
17    y7   ~    x4 -0.387558         0.01444   -26.8399       0.0
18    y7   ~    x6 -0.624882           0.058 -10.773807       0.0
19    x2  ~~  eta2 -0.084431        0.087237  -0.967832  0.333128
20    x4  ~~    x4  1.009523        0.136551   7.393021       0.0
21  eta4  ~~  eta4  0.803514        0.090644   8.864495       0.0
22  eta3  ~~  eta3  0.869520        0.110941   7.837675       0.0
23    x3  ~~    x3  1.114065        0.566346   1.967111   0.04917
24  eta2  ~~  eta2  1.181504  3623126.890326        0.0       1.0
25  eta2  ~~  eta1 -0.498966  4587467.500901       -0.0       1.0
26  eta1  ~~  eta1  1.263544        0.456489   2.767959  0.005641
27    y6  ~~    y5 -0.105931        0.101857  -1.039999   0.29834
28    y6  ~~    y6  1.251659        0.151825   8.244097       0.0
29    y7  ~~    y7  1.096623        0.089539  12.247449       0.0
30    y4  ~~    y4  0.654485         0.11071   5.911725       0.0
31    y3  ~~    y3  0.844282        0.961208   0.878355  0.379751
32    y2  ~~    y2  0.871375        0.751912   1.158879  0.246505
33    y5  ~~    y5  0.823609        0.143472   5.740541       0.0
34    y1  ~~    y1  0.695780        0.435022   1.599413  0.109729