tag:blogger.com,1999:blog-842965756326639856.post7193601988332473730..comments2018-11-08T10:06:02.044-08:00Comments on Eric Jang: Normalizing Flows Tutorial, Part 2: Modern Normalizing FlowsEricnoreply@blogger.comBlogger8125tag:blogger.com,1999:blog-842965756326639856.post-75878552600467039032018-03-13T11:29:37.622-07:002018-03-13T11:29:37.622-07:00Can IAF be used to transform the noise to a "...Can IAF be used to transform the noise to a "mixture" of logistics distribution or it is only for single logistic distribution?budhttps://www.blogger.com/profile/17859512272710680118noreply@blogger.comtag:blogger.com,1999:blog-842965756326639856.post-72921586264053334792018-03-13T11:27:07.620-07:002018-03-13T11:27:07.620-07:00In order to compute "the divergence between t...In order to compute "the divergence between the student and teacher distributions", do we draw multiple samples from base distribution (noise) or from the output of the student?budhttps://www.blogger.com/profile/17859512272710680118noreply@blogger.comtag:blogger.com,1999:blog-842965756326639856.post-7996841091665779992018-02-19T02:18:02.418-08:002018-02-19T02:18:02.418-08:00Great post!
I just wanted to point out one passage...Great post!<br />I just wanted to point out one passage which comes across as slightly inaccurate:<br /><br />"Learning data with autoregressive density estimation makes the rather bold inductive bias that the ordering of variables are such that your earlier variables don’t depend on later variables"<br /><br />As far as I can tell this assumption is not actually made: by the chain rule of probability we can write *any* joint probability density as a product of "telescopic" conditional densities, as in autoregressive models. The inductive bias comes from the fact that, for a fixed functional form of the conditional densities (e.g. Gaussian), not all orderings might be able to give rise to the desired joint distribution (see example in MAF paper).<br /><br />Hope that makes sense.Alessandro Davide Ialongohttps://www.blogger.com/profile/10372259015646341129noreply@blogger.comtag:blogger.com,1999:blog-842965756326639856.post-79736037531104156962018-02-14T09:06:09.780-08:002018-02-14T09:06:09.780-08:00permutation is expensive, but in practice this onl...permutation is expensive, but in practice this only needs to be done 4-5 times to get good results (e.g. fast wavenet).Erichttps://www.blogger.com/profile/05932982386234738790noreply@blogger.comtag:blogger.com,1999:blog-842965756326639856.post-74570005293963811622018-02-14T05:05:44.419-08:002018-02-14T05:05:44.419-08:00This comment has been removed by the author.Alessandro Davide Ialongohttps://www.blogger.com/profile/10372259015646341129noreply@blogger.comtag:blogger.com,1999:blog-842965756326639856.post-50815435711249328452018-02-13T05:14:09.249-08:002018-02-13T05:14:09.249-08:00How would you solve the issue that none of the lay...How would you solve the issue that none of the layers’ autoregressive factorization will be learn the structure of p(x1|x2) in a high dimensinal space? Permutation would become quite expensive.Unknownhttps://www.blogger.com/profile/05492385112365109653noreply@blogger.comtag:blogger.com,1999:blog-842965756326639856.post-23978222604271146792018-01-23T13:16:38.332-08:002018-01-23T13:16:38.332-08:00Thank you! Fixed. Thank you! Fixed. Erichttps://www.blogger.com/profile/05932982386234738790noreply@blogger.comtag:blogger.com,1999:blog-842965756326639856.post-53565233642193978222018-01-23T12:59:01.336-08:002018-01-23T12:59:01.336-08:00"so all it does is slightly the data manifol..."so all it does is slightly the data manifold around the origin"<br /><br />Missing a word at the beginning of the blog post... warp? pivot? transform?Jayhttps://www.blogger.com/profile/10688290677237533067noreply@blogger.com