semopy



Good question.
As it was explained on the support page, you can contact developers either by e-mail or via GitLab issue tracker. The latter is much better because in personal correspondence the author of semopy often grants promises that he forgets or loses motivation to fulfil. Sometimes the author has a mood swing and he starts hating the idea of implementing a feature X (despite him agreeing to do it one day) and forces himself into believing that he never received the e-mail in the first place. Posts on the issue tracker are, however, hard to dismiss. The fact that certain issue has been opened for a long time grants extra responsibility weight on him, motivating him via the mechanism of embarrassment. Furthermore, private conversations through E-mail stay private. By asking a question on an open-access issue tracker, you are also helping others.

It is important to understand what measurement operator means in the first place. If one types
eta =~ y1 + y2 + y3
it reads as "latent variable eta is measured by y1, y2, y3". However, the phrase "measured by" appears to be a bit of a misnomer, as in SEM software is usually the converse: "y1, y2, y3 are measured by eta". In fact, what the measurement operator does, is says that the latent factor regresses onto some observable variables. In semopy, under the hood (as been explained in a greater detail in Syntax part of the tutorial - see DEFINE(latent) command), "=~" is just a syntax sugar that switches the left part and the right parts of the equation and turns "=~" into "~", while pushing variables from the left part into a class of latent variables. Hence arrows are drawn correctly, and inspect provides a valid representation of parameter estimates. Also, report attempts to restore "=~" operation and outputs it together with an inspect DataFrame.

It depends on your task, but the answer that will be correct 90% of the time is Model.

ModelMeans can be useful if you want to have a greater degree of certainty that p-values for regressions of non-normal exogenous variables are sensible. You can also constrain intercepts with it, if you need to. Note that estimating intercepts can be done with Model too via estimate_means function, however it is done separately and we can't impose any constraints.

ModelEffects comes in handy when you know that is a population structure that violates the assumption of independent observations and you you have a ready-to-go covariance-between-individuals matrix. That's very common in bioinormatics and in GWAS specifically, where that covariance matrix is computed from genotypes.

ModelGeneralizedEffects, as the name suggests, is similar to ModelEffects. Two big differences are: a) it can model multiple causes of population structure; b) one does not need to know a precise covariance-across-individuals matrix - the knowledge of its structure is enough. It can be used for modeling spatial data, temporal data or even both. People who are familar with gaussian processes might realize that a more suiting name for this model is ModelGaussianProcess.

In a general case, factor analysis models are not identifiable, i.e. there is no unique local minimum and parameters can be identified only up to some scalar multiplier:
The problem is dealt by either fixing some of the covariances of a model, or by setting some loadings to a scalar value 1.0. The latter is the approach used by semopy and other SEM software. Only the first loading as listed by the "=~" operator is fixed. Note that it happens only if "=~" operator is used and if no other loadings in the right part were fixed by user to some scalar value. This mechanics can be disabled by naming each of the parameters in the right part of the equation, for example:
eta =~ a*y1 + b*y2 + c*y3 
In most realistic scenarios, however, you don't need this.

semopy supports Hueber-White (also known as sandwitch correction for standard errors) standard errors that are allegedly more robust to non-normal data than a conventional approach. You can use them by passing se_robust=True argument to the inspect method or to report function.
Whether it is a good idea is debatable, see the discussion on the subject by Freedman.

Yes, you can pass std_ests=True argument to the inspect method or report function.

At the moment, by-default semopy uses biased covariance matrix computed from a provided data -- that's the approach that is also followed by lavaan. In case of small datasets, the bias might become noticable. When data size is small, you might be better off providing your own unbiased estimate by passing cov=myCov argument to the fit method, for example:
model.fit(data, cov=data.cov())
There is a chance that this will change in future versions and semopy will use an unbiased estimate by default.

It is hard to say, but most commonly it implied that your model is overidentified. FIM is closely related to a Hessian matrix, and it is singular if some parameters can be explained in terms of others. In those cases, instead of a nice local minimum, you get a saddle point. However, you just might be unlucky. When unsure, feel free to contact us.

semopy relies on the logging package to message a user. You can set your desired messaging level by calling
logging.disable()
You can also redirect it to file or some other output stream, see its documentation.

The most probable issue is faulty installation (or an outdated version) of either numpy, scipy or both. Usually, upgrading them to the latest version or reinstalling does the trick.

We accept donations in the form of citations.