What's the probability distribution of a deterministic signal or how to marginalize dynamical systems?

8033 views

In many signal processing calculations, the (prior) probability distribution of the theoretical signal (not the signal + noise) is required.

In random signal theory, this distribution is typically a stochastic process, e.g. a Gaussian or a uniform process.

What do such distributions become in deterministic signal theory?, that is the question.

To make it simple, consider a discrete-time real deterministic signal

$ s\left( {1} \right),s\left( {2} \right),...,s\left( {M} \right) $

For instance, they may be samples from a continuous-time real deterministic signal.

By the standard definition of a discrete-time deterministic dynamical system, there exists:

- a phase space $\Gamma$, e.g. $\Gamma \subset \mathbb{R} {^d}$
- an initial condition $ z\left( 1 \right)\in \Gamma $
- a state-space equation $f:\Gamma \to \Gamma $ having $ z\left( 1 \right)$ in its domain of definition such as $z\left( {m + 1} \right) = f\left[ {z\left( m \right)} \right]$
- an output or observation equation $g:\Gamma \to \mathbb{R}$ such as $s\left( m \right) = g\left[ {z\left( m \right)} \right]$

Hence, by definition we have

$\left[ {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right] = \left\{ {g\left[ {z\left( 1 \right)} \right],g\left[ {f\left( {z\left( 1 \right)} \right)} \right],...,g\left[ {{f^{M - 1}}\left( {z\left( 1 \right)} \right)} \right]} \right\}$

or, in probabilistic notations

$p\left[ {\left. {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right|z\left( 1 \right),f,g,\Gamma ,d} \right] = \prod\limits_{m = 1}^M {\delta \left\{ {g\left[ {{f^{m - 1}}\left( {z\left( 1 \right)} \right)} \right] - s\left( m \right)} \right\}} $

Therefore, by total probability and the product rule, the marginal joint prior probability distribution for a discrete-time deterministic signal conditional on phase space $\Gamma$ and its dimension $d$ formally/symbolically writes

$p\left[ {\left. {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right|\Gamma ,d} \right] = \int\limits_{{\mathbb{R}^\Gamma }} {{\text{D}}g\int\limits_{{\Gamma ^\Gamma }} {{\text{D}}f\int\limits_\Gamma {{{\text{d}}^d}z\left( 1 \right)\prod\limits_{m = 1}^M {\delta \left\{ {g\left[ {{f^{m - 1}}\left( {z\left( 1 \right)} \right)} \right] - s\left( m \right)} \right\}p\left( {z\left( 1 \right),f,g} \right)} } } } $

Should phase space $\Gamma$ and its dimension $d$ be also unknown *a priori*, they should be marginalized as well so that the most general marginal prior probability distribution for a deterministic signal I'm interested in formally/symbolically writes

$p\left[ {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right] = \sum\limits_{d = 2}^{ + \infty } {\int\limits_{\wp \left( {{\mathbb{R}^d}} \right)} {{\text{D}}\Gamma \int\limits_{{\mathbb{R}^\Gamma }} {{\text{D}}g\int\limits_{{\Gamma ^\Gamma }} {{\text{D}}f\int\limits_\Gamma {{{\text{d}}^d}z\left( 1 \right)\prod\limits_{m = 1}^M {\delta \left\{ {g\left[ {{f^{m - 1}}\left( {z\left( 1 \right)} \right)} \right] - s\left( m \right)} \right\}p\left( {z\left( 1 \right),f,g,\Gamma ,d} \right)} } } } } } $

where ${\wp \left( {{\mathbb{R}^d}} \right)}$ stands for the powerset of ${{\mathbb{R}^d}}$.

Dirac's $\delta$ distributions are certainly welcome to "digest" those very high dimensional integrals. However, we may also be interested in probability distributions like

$p\left[ {s\left( 1 \right),s\left( 2 \right),...,s\left( M \right)} \right] \propto \sum\limits_{d = 2}^{ + \infty } {\int\limits_{\wp \left( {{\mathbb{R}^d}} \right)} {{\text{D}}\Gamma \int\limits_{{\mathbb{R}^\Gamma }} {{\text{D}}g\int\limits_{{\Gamma ^\Gamma }} {{\text{D}}f\int\limits_\Gamma {{{\text{d}}^d}z\left( 1 \right)\int\limits_{{\mathbb{R}^ + }} {{\text{d}}\sigma {\sigma ^{ - M}}{e^{ - \sum\limits_{m = 1}^M {\frac{{{{\left\{ {g\left[ {{f^{m - 1}}\left( {z\left( 1 \right)} \right)} \right] - s\left( m \right)} \right\}}^2}}}{{2{\sigma ^2}}}} }}p\left( {\sigma ,z\left( 1 \right),f,g,\Gamma ,d} \right)} } } } } } $

Please, what can you say about those important probability distributions beyond the fact that they should not be invariant by permutation of the time points, i.e. not De Finetti-exchangeable?

What can you say about such strange looking functional integrals (for the state-space and output equations $f$ and $g$) and even set-theoretic integrals (for phase space $\Gamma$) over sets having cardinal at least ${\beth_2}$? Are they already well-known in some branch of mathematics I do not know yet or are they only abstract nonsense?

More generally, I'd like to learn more about functional integrals in probability theory. Any pointer would be highly appreciated. Thanks.

asked Apr 27, 2016 in Mathematics by Fabrice Pautot (30 points) [ revision history ]
edited Apr 29, 2016 by Fabrice Pautot

I don't understand the goal of your question. The only difference between the determinsitic and the stochastic case is that in the dynamics the coefficient of the noise term is zero. Thus one can use all tools for stochastic time series analysis also in the deterministic case - where only the initial condition is random. (That one cannot easily evaluate certain integrals is a problem one everywhere has....)

commented Apr 28, 2016 by Arnold Neumaier (15,787 points) [ no revision ]

Are you interested in the discrete or the continuous time case?

commented Apr 28, 2016 by Arnold Neumaier (15,787 points) [ no revision ]

Thanks for your comments Arnold.

Regarding comment 2: I'm interested in both the discrete- and continuous-time cases but the discrete-time one is already sufficiently nasty I believe!

Regarding comment 1: suppose the experimental noise is additive. It is common practice to model the sum of the theoretical signal + noise as a stochastic process and to use the tools from stochastic time series analysis/signal processing.

But there are in fact two radically different cases: either the theoretical signal is itself stochastic or it is deterministic. It appears that most of time we are actually assuming, more or less explicitly, the theoretical signal to be itself stochastic.

From this, it also appears that common tools in stochastic time series/signal processing such as Wiener's classical cross-correlation function may not be not suitable for deterministic signals. Please see this question on MO, which is the motivation underlying this question:

http://mathoverflow.net/questions/236527/is-there-a-bayesian-theory-of-deterministic-signal-prequel-and-motivation-for-m?rq=1

I'm gonna ask it on PO as well.

So, my goal was precisely to fix classical cross-correlation functions for deterministic signals.

For this purpose, in theory I just need to assign a suitable joint probability distribution for the samples of my discrete-time deterministic signals in order to determine more suitable time series/signal processing tools for deterministic signals.

But when you write down such probability probability distributions, by marginalizing 1) the initial condition 2) the state-space equation 3) the output/observation equation 4) and the phase space and its dimension, you fall on seemingly monstrous functional integrals that are still unidentified at this time.

Should those probability distributions for deterministic signals be also usual stochastic processes, in particular should they be invariant by permutation of the time points, then classical time series analysis/signal processing tools would work for both stochastic/random and deterministic theoretical signals.

But should they be different from usual stochastic processes because time still plays an essential role in them, while time plays essentiallu no role in (i.i.d. or De Finetti-exchangeable) stochastic processes, then there would exist two different theories of time series analysis/signal processing, one for stochastic theoretical signals that we know well, the other one for deterministic signals waiting to be developed, to the best of my knowledge, if we can ever define and compute those monstrous functional integrals.

commented Apr 28, 2016 by Fabrice Pautot [ no revision ]

Your comment on this question:

To answer, leave an answer instead. Comments are usually for non-answers.
To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL.
To alert a user, please use the "@" command and remove spaces from the username, example, the user "John Doe" should be pinged as "@JohnDoe", while the user "Johndoe" should be pinged as "@Johndoe". The post author is always automatically pinged (unless you are the post author).
Please consult the FAQ for as to how to format your post.

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if a comment is added after mine:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

3 Answers

A discrete stochastic process for $x_t$ with a deterministic dynamics $x_{t+1}=f(x_t)$ is specified by the distribution of the initial condition.

Thus one models $x_0$ as a random vector $x_0(\omega)$ with a measure $d\mu$ on the space $\Omega$ over which $\omega$ varies, and defines $x_{t+1}(\omega):=f(x_t)(\omega)$. This specifies all expectations $$\langle f(x_0,\ldots,x_t)\rangle=\int d\mu(\omega)f(x_0(\omega),\ldots,x_t(\omega))$$ and hence the (highly singular) joint probability distribution. Working with the functional integral is in my opinion overkill in this case.

if the determinsitc model equation is not known one generally assumes a parametric form $f(x)=F(\theta,x)$ for it. then all expectations above depend on $\theta$ as well, one one can use experimental or data to estimate in the traditional way $\theta$ from a number of empirical expectations.

On the other hand, in practical estimation, one always assumes the presence of process noise and estimates it together with the noise in the initial conditions, the noise in the observations, and the parameters of the process. The process can be taken to be deterministic if the standard deviation of the process noise is negligible compared with the signal according to some test for negligible covariance parameters. Indeed, this is the way to numerically distinguish deterministic chaotic time series from stochastic ones. In particular, one can use all standard statistical tools for time series.

answered Apr 28, 2016 by Arnold Neumaier (15,787 points) [ revision history ]
edited May 2, 2016 by Arnold Neumaier

Most voted comments show all comments

@ArnoldNeumaier

Yes Arnold, that was just a comment following your answer, not an answer to my own question.

I don't understand how my comment became an answer, I've to be more careful!

I'm preparing a comment following your answer's update and I will post it ASAP.

See you... Fabrice

commented May 2, 2016 by Fabrice Pautot [ no revision ]

@ArnoldNeumaier

I allow myself to reply Arnold please. To be very short, I can tell you
that Poincaré is right, I've been studying this particular point
over the last 20 years. Hint/starting point:

Mister Poincaré, you are wrong. Your works prove that some people
can think only nonsense.

Vladimir Ilitch Oulianov, better known as Lenin, Materialism and
empiriocriticism, 1908.

Lenin --> Stalin --> Kolmogorov (Stalin prize, 1941)

Kolmogorov was definitely not allowed to follow Poincaré (= one
way ticket to the Gulag) but, of course, he would have followed.

Read the Grunbegriffe one more time very carefully then, right after,
one of his last paper, Foundations of probability theory, 1983...

Vladimir Arnold knows about this... Can develop this as much as you
want. My question does not come from out of space...

Fabrice

commented May 16, 2016 by UnknownToSE (505 points) [ no revision ]

@FabricePautot

I have just reinstalled some comments that got lost due to our recent techical problems.
Maybe you would like to consider registering an account, such that I can correctly assign all of your contributions to it?

commented May 16, 2016 by Dilaton.admin (0 points) [ no revision ]

@Dilaton.admin

Yes, definitely, I need to register.

I'm happy to see that our discussion with Arnold has finally been restored.

One remark please: French accents have been corrupted due to those recent technical difficulties: for instance Poincaré now displays as Poincar&eacute.

Kindest regards, Fabrice.

commented May 16, 2016 by Fabrice Pautot [ no revision ]

@FabricePautot

Ok, I have just created a thread to claim unregistered contributions

http://physicsoverflow.org/36103/claims-of-unregistered-contributions

Maybe you can answer it as soon as you have registered?

After your contributions are assigned to your registered account, you will have full control over them to edit or correct them etc ...

commented May 16, 2016 by Dilaton (6,240 points) [ no revision ]

Most recent comments show all comments

If you want to answer practical questions you can always add a tiny
amount of Gaussian process noise and then take in the answer the limit
of vanishing variance.

commented May 16, 2016 by Arnold Neumaier (15,787 points) [ no revision ]

@ArnoldNeumaier

Dear Arnold,

Thank you again for your kind reply. Ok, this is my last comment.

As you said, you are still considering the problem of
estimating/identifying/modelling a dynamical system from (noisy)
experimental data. This is a kind of problems I know quite well since
I'm earning my living modelling and processing nonlinear deterministic
signals, in particular physiological signals from
electroencephalography, electromyography, electrooculography or MRI and
Computed Tomography functional imaging of the brain.

My PO question/problem arises from some practical problems in this area:
quantifying the dependency between two signals/time series. For random
signals having for instance improper uniform distribution, it is easy to
prove (see Scargle's paper) that a sufficient statistics for this
problem is the classical covariance. But for deterministic signals,
that's a completely different story: we have many tools such as
nonlinear dependencies, instantaneous phase synchronization via Hilbert
transform, much entropic stuff, etc. See for instance this thesis from
TU Wien:

http://publik.tuwien.ac.at/files/PubDat_189752.pdf

But as far as know, all of them are adhockeries from the point of view
of Bayesian probability theory: they are not derived from the joint
marginal posterior probability distribution for the current problem. In
particular, for a given problem, there should be only one sufficient
statistics, not dozens of them!

In theory, we just need to compute this joint marginal posterior
probability distribution and marginalize all nuisance parameters in
order to derive our sufficient statistics. See Scargle's paper for an
important, illuminating example. But in order to that, we need to supply
the prior probability distribution of our signal(s).

Heres is the main problem and the purpose of my PO question: again, in
theory, we know how to compute the (marginal) prior probability
distribution of our deterministic signal: just marginalize all nuisance
"parameters" that are (at most) the initial condition, the state-space
equation, the output equation, the phase space and its dimension for
dynamical systems.

But in "practice" it is not yet clear, at least to my poor
understanding, if we can well define (noninformative) joint prior
probability distributions over those parameters because some of them are
not usual random variables but functions. Subsequently, it is even less
clear how to compute the required marginal prior probability
distribution of the signal(s) from those hypothetical joint prior
probability distributions.

So, starting from some practical problems for which no sufficient
satistics seems to be known to the best of my knowledge, we finally fall
on a purely theoretical and mathematical one (which has nothing to do
with dynamical system identification/estimation or modelling), which is
to my mind of fundamental interest because its solution could give birth
to a new theory of deterministic signal (processing) if it ever appears
that (noninformative) prior probability distributions of deterministic
signals are different from usual stochastic processes, for instance not
invariant by permutation of the time points/De Finetti-exchangeable
because the time, in particular the time arrow, still plays an essential
role in them. Contrary to your point of view, at this point I am unable
to see any reason why those prior probability distributions should
necessarily match usual usual, e.g. i.i.d. or De Finetti-exchangeable
stochastic processes. On the contrary, some of us (at least myself!)
could/would conjecture that they are NOT De Finetti-exchangeable because
they are not yet ready to abandon the time (arrow) within deterministic
dynamical system theory.

That was my last chance to explain my problem.

Finally, I used to believe, together with Henri Poincaré, that
time (arrow), dynamical system theory and Bayesian probability theory
were all part of (mathematical) physics. Please see his Calcul des
Probabilités:

http://visualiseur.bnf.fr/Visualiseur?Destination=Gallica&O=NUMM-29064

Time to say goodbye to you Arnold. Thanks for the many comments!!!

Hope I will continue the discussion with other good fellows.

commented May 16, 2016 by UnknownToSE (505 points) [ no revision ]

Your comment on this answer:

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if a comment is added after mine:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

Conditioning by event D = {'System deterministic'} you are not restricting the search space. There are infinite non-parametrized functions that will agree with it. You will find such a deterministic function when $\mid \Omega \mid$= 1 of the chosen probability space. Formulated as an optimization problem, D states, in the best case, that such a minimum exists.

The question closely related to Kolmogorov complexity, algorithmic information theory and machine learning.

answered Oct 14, 2018 by Vadim [ no revision ]

Your comment on this answer:

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if a comment is added after mine:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

Hi, I am also keen to know how to find the distribution of the histogram of excess power generated from a deterministic signal normalized by N sigma^2. Suppose the signal form is Gaussian, then the histogram of the excess power initiates with peaks in heights of the bins and lowers down and again rises in the extreme end.

My purpose is to find a distribution for such a histogram.

Thanks

answered Jun 13, 2020 by Pi [ no revision ]

Your comment on this answer:

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if a comment is added after mine:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

Your answer

Please use answers only to (at least partly) answer questions. To comment, discuss, or ask for clarification, leave a comment instead.
To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL.
Please consult the FAQ for as to how to format your post.
This is the answer box; if you want to write a comment instead, please use the 'add comment' button.

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if my answer is selected or commented on:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

If you are a human please identify the position of the character covered by the symbol $\varnothing$ in the following word:
p$\hbar$ysicsOverfl$\varnothing$w
Then drag the red bullet below over the corresponding character of our banner. When you drop it there, the bullet changes to green (on slow internet connections after a few seconds).

Please complete the anti-spam verification

News

Tools for paper authors

Tools for SE users

Public \(\beta\) tools

Most popular tags

Site Statistics

What's the probability distribution of a deterministic signal or how to marginalize dynamical systems?

Your comment on this question:

Live Preview

Preview

3 Answers

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your answer

Live Preview

Preview

News

Tools for paper authors

Tools for SE users

Public \(\beta\) tools

Most popular tags

Related questions

Site Statistics

What's the probability distribution of a deterministic signal or how to marginalize dynamical systems?

Your comment on this question:

Live Preview

Preview

3 Answers

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your answer

Live Preview

Preview