摘要 | 因子分析是一种通过探索变量之间的协方差结构, 进而从变量群中提取公共因子的统计降维技术. 目前主要分为探索性因子分析和验证性因子分析两种, 已被广泛应用于降维、变量提取、结构方程模型的测量方程等多方面. 传统的因子模型是基于条件期望模型理论, 但在某些特定情况下, 由于原始变量分布的复杂性, 人们感兴趣的不仅涉及均值中潜在因子的影响, 而且涉及由分位数表示的整个响应分布中潜在因子的影响时, 仅仅从均值位置中提取公共因子是不足以全面刻画数据的内部结构的.因此, 无论是探索性因子分析、验证性因子分析还是因子分析的进一步拓展应用, 考虑不同分位数下提取潜在的公共因子, 用以反映观测变量和潜在变量、潜在变量与潜在变量之间的关系, 就显得尤为必要.
因子分析中最核心的内容是因子载荷矩阵的估计. 近年来, 在日益高维和复杂的数据背景下, 因子载荷矩阵估计中获得具有稀疏结构的载荷矩阵对因子分析应用的可扩展性越来越重要. 大量学者采用贝叶斯统计推断方法, 包括以先验信息来降低维数或帮助选择潜在因子数量, 同时通过后验分布模式来提供不确定性的概率度量. 为此, 本文主要在贝叶斯框架下讨论若干分位数因子模型及其拓展的统计推断问题, 它不仅具有分位数提取公共因子的优势, 同时使因子分析的降维具有稀疏特性, 而且利用历史数据和先验信息进行辅助分析.
本文围绕因子模型的基本理论、方法和应用展开, 全面梳理因子模型、分位数因子模型、贝叶斯因子载荷估计的现有文献, 按照“正态因子模型—稳健因子模型—分位数因子模型—分位数因子模型应用拓展”的研究思路, 分别从探索性因子分析、验证性因子分析及结构方程模型中的因子分析应用等方面展开分位数因子模型的理论和应用研究. 具体工作如下:
首先, 针对传统因子模型正态似然假定的局限性, 为应对现实中遇到的非正态数据类型, 提高因子分析的有效性和对异常值建模的稳健性, 本文在全局-局部收缩先验下提出了基于多元分布似然的贝叶斯稳健因子模型, 以获得更稳健的协方差估计.
其次, 利用非对称Laplace分布相对简单的形式和对偏态、异常值建模的能力强的特点, 将分位数提取公共因子分别引入探索性因子分析、验证性因子分析及延伸到结构方程模型中, 分别得到分位数因子模型和分位数结构方程模型. 在分位数因子模型中, 在广义非对称Laplace分布下给出了基于狄利克雷Lapalce先验的 贝叶斯分层分位数因子模型参数的后验估计. 在分位数结构方程模型中,通过理论假设因子载荷的稀疏模式, 利用测量方程探寻观测变量和潜在变量间的数量关系;
借助于目前比较流行的稀疏先验(如贝叶斯Lasso、马蹄形先验和钉板Lasso先验), 利用结构方程进一步探索解释潜变量对结果潜变量的影响; 并考虑了贝叶斯分位数结构方程模型的变分近似推断, 同时结合bootstrap重采样技术进一步解决了变分后验分布中存在的偏差和方差低估等问题.
最后, 从应用视角, 资产定价模型的核心是确定资产的预期收益及其与资产风险暴露的关系, 通常用高度结构化、简化的因子模型来对股票收益变动进行建模分析. 基于传统的因子模型和基于特征的因子模型所驱动, 在贝叶斯框架下, 构建基于特征的半参数分位数因子模型, 通过Bayesian P-样条基拟合来刻画平滑的非线性函数, 用于分析潜在因子对超额收益的线性和非线性影响.
本文主要在贝叶斯框架下提出了若干分位数因子模型及其拓展的稳健估计方法, 同时给出了因子载荷的稀疏估计、参数估计以及区间估计等. 数值模拟表明,本文提出的贝叶斯后验估计对实际中遇到的非正态数据类型, 尤其是具有尖峰厚尾分布的数据类型具有较为理想的估计结果, 能够应对特殊因子不满足传统正态假定的因子模型的估计问题. 在实际应用中, 分别以FRED-QD数据、财务数据和股票数据进行了佐证.
结果表明: 本文提出的方法具有较为广泛的实际应用场景和较为理想的估计结果, 可很好地适用于对高维数据进行更为有效的分析,同时为不同领域的使用者提供有效的信息获取思路. |
英文摘要 | Factor analysis is a statistical dimensionality reduction technique that extracts common factors from a group of variables by exploring the covariance structure between variables. Currently, it is mainly divided into exploratory factor analysis and confirmatory factor analysis, which have been widely used in dimensionality reduction, variable extraction, and the measurement equation of structural equation model and many other aspects. The traditional factor model is based on the theory of conditional expectation model. However, in some specific cases, due to the complexity of the distribution of the original variable, people are interested in not only the influence of the latent factors in the mean, but also the influence of latent factors in the entire response distribution represented by quantiles. Simply extracting common factors from the mean position is not sufficient to fully characterize the internal structure of data. Therefore, whether it is exploratory factor analysis, confirmatory factor analysis or further expanded applications of factor analysis, it is particularly necessary to consider extracting latent common factors at different quantiles to reflect the relationships between observed variables and latent variables, as well as between latent variables and latent variables.
The core content of factor analysis is the estimation of factor loadings matrix. In recent years, under the background of increasingly high-dimensional and complex data, it is more and more important to obtain a factor loading matrix with sparse structure in the estimation of factor loading matrix for the scalability of factor analysis applications. Many scholars have adopted Bayesian statistical inference methods, including using prior information to reduce the dimensionality or help select the number of latent factors, while providing a probability measure of uncertainty though posterior distribution patterns. Therefore, this thesis mainly discusses the statistical inference problems of some quantile factor models and their extensions under the Bayesian framework. It not only has the advantage of extracting common factors at different quantiles, but also makes the dimensionality reduction of factor analysis to have sparse characteristics, and can utilize historical data and prior information for auxiliary analysis.
This thesis focuses on the basic theory, methods and applications of factor models. It comprehensively reviews the existing literature on factor models, quantiles factor models and Bayesian factor loading estimation. Following the research framework of "normal factor model — robust factor model — quantile factor model — extended application of quantile factor model", it explores the theoretical and applied research on quantile factor models from various aspects, including exploratory factor analysis, confirmatory factor analysis and factor analysis application in structural equation model. The specific work tasks include:
Firstly, to address the limitations of the normal likelihood assumption in traditional factor models and to handle non-normal data types encountered in reality, as well as to improve the effectiveness of factor analysis and the robustness of modeling for outliers, a Bayesian robust factor model based on multivariate distribution likelihood is proposed under the global-local shrinkage prior prior to obtain more robust covariance estimations.
Secondly, utilizing the relatively simple form of the Asymmetric Laplace distribution and its strong ability to model skewness and outliers,
the quantile-extracted common factor is introduced into exploratory factor analysis, confirmatory factor analysis and extended to structural equation models, respectively, to obtain quantile factor models and quantile structural equation models. In the quantile factor model, the posterior estimation of the parameters of the Bayesian hierarchical quantile factor model based on the Dirichlet Laplace prior is given under the Generalized Asymmetric Laplace distribution. In the quantile structural equation model, the quantitative between observed variables and latent variables is explored through measuring equations by theoretically assuming a sparse pattern of factor loadings. The influence of explanatory latent variables on outcome latent variables is explored through structural equations by using the currently popular sparse priors, such as Bayesian Lasso, horseshoe prior and spike and slab Lasso prior. Moreover, the variational approximation inference of Bayesian quantile structural equation models is considered, and the bootstrap resampling techniques is used to further address issue such as bias and variance underestimation problems in variational posterior distribution.
Finally, from the perspective of application, the core of asset pricing model is to determine the expected return of assets and its relationship with asset risk exposure. Typically, highly structured and simplified factor model are used to model and analyze changes in stock return. Driven by both traditional factor models and characteristic-based factor models, in the Bayesian framework, a characteristic-based semi-parameter quantile factor model is constructed, and a smooth nonlinear function is characterized through Bayesian P-spline basis fitting to analyze the linear and nonlinear effects of latent factors on excess returns.
This study mainly provides robust estimation methods for several quantile factor models and its extension under the Bayesian framework, and also provides the sparse estimation of loading matrix, parameter estimations and interval estimations. Numerical simulation shows that the Bayesian posterior estimation proposed in this thesis has ideal estimation results for non-normal data types encountered in practice, especially for data with sharp peaks and thick tails. It can deal with the estimation problems of factor models where special factors don’t satisfy the traditional assumption. In practical application, the FRED-QD data, financial data and stock data are respectively used as evidence. The results indicate that the method proposed in this thesis has a wide range of practical application scenarios and ideal estimation results. It can be well applied to more effective analysis of high-dimensional data and provide effective information acquisition ideas for users in different fields.
|
修改评论