https://blog-en.fltech.dev/entry/2026/03/12/AAAI26-Paper-CausalAI-en

Hello, we are Hirofumi Suzuki and Kentaro Kanamori from the Artificial Intelligence Laboratory. Fujitsu participated in the prestigious international AI conference "The 40th Annual AAAI Conference on Artificial Intelligence (AAAI-26)" held in Singapore from January 20 to 27, 2026, presenting multiple papers and hosting a workshop. We will now deliver a series of articles about AAAI-26.

In this article, we introduce two of our research papers accepted to the main conference, covering (1) causal discovery over heterogeneous datasets and (2) ultrafast nonlinear causal discovery. In the next post in this series, we will explain another accepted paper, according to the following schedule:

Part 1: AAAI-26 Participation and Exhibition #1
- Report on Hosting the Workshop (Published)
Part 2: AAAI-26 Participation and Exhibition #2
- Report on the Paper Presentation on Causal AI Technology (This article)
Part 3: AAAI-26 Participation and Exhibition #3
- Report on the Paper Presentation on AI Reasoning Technology (Scheduled for March 16)

Two Papers on Causal AI were Accepted to the Main Conference of AAAI-26

At Fujitsu Research, we conduct R&D on data-driven decision-making support using Causal AI technologies, with the goal of applying these methods to real-world decision tasks—for example, discovering interventions to improve employee productivity based on engagement survey data. At AAAI-26, two of our research papers on “causal discovery,” a key core technology in Causal AI, were accepted to the main conference (acceptance rate: 17.6%).

Paper 1
▶ Title: I-CAM-UV: Integrating Causal Graphs over Non-Identical Variable Sets Using Causal Additive Models with Unobserved Variables
▶ Authors: Hirofumi Suzuki, Kentaro Kanamori, Takuya Takagi, Thong Pham, Takashi Nicholas Maeda, Shohei Shimizu
▶ Venue: 40th AAAI Conference on Artificial Intelligence (AAAI 2026)
▶ Link: arXiv

Paper 2
▶ Title: Sparse Additive Model Pruning for Order-Based Causal Structure Learning
▶ Authors: Kentaro Kanamori, Hirofumi Suzuki, Takuya Takagi
▶ Venue: 40th AAAI Conference on Artificial Intelligence (AAAI 2026)
▶ Link: arXiv

Background: What is Causal Discovery?

Causal discovery is a technique for estimating causal relationships among variables from observational data (Figure 1). For example, given a dataset with three variables—rainfall, number of visitors, and sales—causal discovery may infer relationships such as “increased rainfall decreases visitors” (rainfall → visitors) and “more visitors increase sales” (visitors → sales). These relationships can be summarized as a causal graph, where directed arrows represent cause-to-effect relationships.

Once a causal graph is obtained, we can reason about how an intervention (externally setting or changing a variable) may propagate its effects to other variables. For instance, by performing causal discovery on multiple indicators related to employee productivity, we can analyze “which factors to intervene on” and “how much improvement might be expected,” supporting practical decision-making such as planning productivity-improvement actions.

One of the core technologies supporting Fujitsu Causal AI is causal discovery. In Causal AI, we leverage a learned causal DAG (Directed Acyclic Graph) from observational data to provide data-driven decision support. However, real-world data often:

is distributed across multiple heterogeneous datasets, and
contains complex causal relationships that are difficult to estimate.

To address these challenges, our AAAI 2026 papers propose new causal discovery techniques and demonstrate their effectiveness through experiments.

I-CAM-UV: Integrating Causal Graphs over Non-Identical Variable Sets Using Causal Additive Models with Unobserved Variables

Summary
We propose I-CAM-UV, a new method that integrates estimated causal DAG obtained from heterogeneous datasets to enumerate candidate causal DAGs consistent with all dataset-specific results, and we show it can discover cross-dataset causal relations that conventional methods miss within practical runtime.

Motivation and Problem Setting

In real data analysis projects, even when addressing the same business problem, the available variable sets often differ across datasets due to differences in data owners, measurement designs, or logging systems. For example, Department A may collect HR-related variables, while Department B collects operational logs, and only some variables overlap.

By integrating these heterogeneous datasets, we may discover causal relations that are not visible from any single dataset. For instance:

HR data suggests “improving work-life balance affects productivity,” and
operational logs suggest “productivity affects project success.”

Integrating them could reveal a new relation such as “improving work-life balance affects project success.”

A naïve approach is to run standard causal discovery on each dataset and then simply overlay the resulting DAGs. However, for variable pairs never observed together in any dataset, we cannot directly estimate their relations by regression or statistical tests. Therefore, simple overlay approaches cannot, in principle, discover causal relations across never-coobserved variables (Figure 2). Moreover, variables observed only in some datasets behave like unobserved (latent) variables in others, which can degrade estimation if we use methods that assume no latent variables.

Figure 2. Example of causal discovery over heterogeneous datasets.

Existing multi-dataset approaches include methods that estimate Partial Ancestral Graphs (PAGs), but PAGs may contain edges with ambiguous directions, which makes them less interpretable in practice. Some methods can output fully directed DAGs by restricting assumptions to linear non-Gaussian settings, but their performance is known to deteriorate on real data involving nonlinearity.

Our goal is to estimate fully directed causal DAGs while allowing nonlinear relationships, and to do so in practical computation time.

Core Idea of I-CAM-UV

In this paper, we proposed a new method, I-CAM-UV, which integrates the estimated causal DAGs from multiple datasets and enumerates candidate causal DAGs that are consistent with the estimation results from all datasets. Our method is built on CAM-UV [4], a causal discovery method proposed by Maeda & Shimizu (2021). CAM-UV is a nonlinear causal discovery approach that explicitly accounts for unobserved variables; given a single dataset, it estimates (i) directed edges that are identifiable and (ii) variable pairs whose directions remain unidentified.

The core idea of I-CAM-UV is to use, as explicit constraints during DAG integration, both the identified directed edges output by CAM-UV and the information on unidentified variable pairs explained by unobserved causal paths (UCPs) and unobserved backdoor paths (UBPs). This enables us to recover causal relations that would be missed by simple overlay-based integration, thereby substantially improving the practicality of causal discovery in heterogeneous-data settings. On the other hand, there may be multiple integrated causal DAGs that satisfy these constraints. Therefore, we also proposed an efficient algorithm that can enumerate such integrated causal DAG candidates quickly.

Let $V_k$ denote the set of variables in the $k$ -th dataset among $m$ datasets, and let the integrated variable set be $\hat{V} = \bigcup_{k=1}^m V_k$ . We define the set of unobserved variables for dataset $k$ as $U_k := \hat{V} \setminus V_k$ . Applying CAM-UV to each dataset yields a mixed graph $G_k = (V_k, A_k, N_k)$ , where $A_k$ is the set of identified directed edges and $N_k$ is the set of unidentified pairs. Let the overlaid directed edges be $\hat{A} := \bigcup_{k=1}^m A_k$ . Then, the sets of unidentified candidates are defined as

$\displaystyle E_{\mathrm{imp}} := \left( \bigcup_{k=1}^m N_k \right) \setminus \{ \{ v_i, v_j \} \mid (v_i, v_j) \in \hat{A} \},\quad E_{\mathrm{uno}} := \{ \{ v_i, v_j \} \subseteq \hat{V} \mid \forall k,\{ v_i, v_j \} \not\subseteq V_k \}$

Moreover, the causes of non-identifiability in CAM-UV are UCPs and UBPs. A UCP is an unobserved causal path of the form $v_i \rightarrow \cdots \rightarrow v_h \rightarrow v_j$ (with $v_h \in U_k$ ), and a UBP is an unobserved backdoor path of the form $v_i \leftarrow v_x \leftarrow \cdots \leftarrow v_a \rightarrow \cdots \rightarrow v_y \rightarrow v_j$ (with $v_x, v_y \in U_k$ ) (Figure 3). Hence, while the directions of the pairs in $N_k$ are unidentified on $V_k$ , they can be interpreted as the set of variable pairs for which UCPs/UBPs exist in the background.

I-CAM-UV exploits this information to evaluate consistency with the estimation results from each dataset. First, we define $E := E_{\mathrm{imp}} \cup E_{\mathrm{uno}}$ and $I_k := \{ \{ v_i, v_j \} \subseteq V_k \mid \{ v_i, v_j \} \notin N_k \}$ . Next, for each pair $\{v_i, v_j \}$ in $E$ , we choose one of the following three options: “insert the edge $v_i \rightarrow v_j$ ,” “insert the edge $v_j \rightarrow v_i$ ,” or “insert no edge.” Let $\tilde{A}$ be the set of directed edges obtained by collecting these choices. The integrated candidate graph is then $\tilde{G} = (\hat{V}, \hat{A} \cup \tilde{A})$ . Furthermore, let $\mathrm{UP}_{G, V_k}(v_i, v_j)$ denote the set of all UCPs/UBPs between $v_i$ and $v_j$ on $G$ under $U_k = \hat{V} \setminus V_k$ . Then, consistency is defined as

$\displaystyle \forall k,\ \forall\{v_i,v_j\}\in I_k:\ \mathrm{UP}_{\tilde{G},V_k}(v_i,v_j)=\emptyset, \qquad \forall k,\ \forall\{v_i,v_j\}\in N_k:\ \mathrm{UP}_{\tilde{G},V_k}(v_i,v_j)\neq\emptyset$

Under ideal conditions ( $\hat{V}=V$ and no estimation error in CAM-UV), we can theoretically show that the true causal DAG satisfies this consistency property (Theorem 1). In practice, however, there can be multiple causal DAGs that satisfy consistency, so the true causal DAG cannot be uniquely identified. To address this, we define an inconsistency cost $C(\tilde{G})$ that quantitatively evaluates inconsistency, and we propose an efficient enumeration algorithm based on best-first search that leverages the monotonicity of a lower bound, listing candidates in increasing order of cost (Figure 4).

Figure 4. Candidate enumeration algorithm in I-CAM-UV.

Experimental Results

We generated 100 synthetic datasets with nonlinear causal relations among 10 variables. We evaluated settings with the number of datasets $m \in \{ 2, 3 \}$ and the number of unobserved variables per dataset $|U| \in \{ 3, 4 \}$ . Baselines include:

simple overlay methods (PC-UV-OVL, CAM-UV-OVL),
an imputation-based approach, and
a linear non-Gaussian method (CD-MiNi).

The results show that I-CAM-UV achieves higher recall for both co-observed and non-co-observed pairs, particularly excelling at recovering relations that CAM-UV alone cannot detect and at discovering causal relations spanning never-coobserved variable pairs. As expected from its candidate-enumeration nature, precision can be lower than CAM-UV-OVL, reflecting a trade-off with more false positives; however, the overall F1 score is generally comparable, indicating balanced performance. Runtime was also practical for sparse graphs of around 10 variables, demonstrating the effectiveness of best-first search.

Sparse Additive Model Pruning for Order-Based Causal Structure Learning

Summary
We propose SARTRE, a new pruning technique for order-based nonlinear causal discovery that maintains accuracy while achieving up to ~5× speedup compared with standard approaches.

Motivation: Order-Based Methods and Their Bottleneck

We focus on nonlinear causal discovery, as real-world causal relationships often exhibit nonlinearities such as non-monotonicity and heterogeneity. While nonlinear causal relationships are known as identifiable from purely observational data under certain assumptions, efficient algorithms for accurate estimation remain an important challenge.

A major approach is order-based causal discovery (Figure 6). These methods:

estimate a topological order of variables (ordering step), and then
construct a complete DAG consistent with the order and prune unnecessary edges (pruning step).

Figure 6. Overview of order-based causal discovery.

The ordering step reduces the otherwise super-exponential search space, so many works focus there. For the pruning step, the standard choice has been CAM-pruning, which repeatedly fits Generalized Additive Models (GAMs) and performs hypothesis tests for many variable pairs.

However, CAM-pruning has two major issues:

High computational cost, especially as the number of variables grows (GAM fitting becomes expensive).
Potential accuracy degradation due to multiple testing, since many pairwise hypothesis tests are performed.

Core Idea of SARTRE

In this paper, we focus on the pruning step of order-based causal discovery and achieve faster nonlinear causal discovery by proposing a new pruning method that addresses the computational cost and estimation-accuracy issues of CAM-pruning. The core idea of our approach is to use a sparse additive model instead of a GAM. This makes it possible to identify redundant edges without hypothesis testing, thereby avoiding the multiple-testing problem. However, learning conventional sparse additive models is still computationally expensive, similarly to learning GAMs. To overcome this, we propose Sparse Additive Randomized TRee Ensemble (SARTRE) as a new framework for learning sparse additive models efficiently (Figure 7).

For a variable $X_i$ , let $\mathrm{pa}(i) \subseteq { 1, \dots, d }$ be the set of candidate parent variables determined by a previously estimated topological order. The task here is to learn a regression model $f_i(X_{\mathrm{pa}(i)}) \approx X_i$ using $\mathrm{pa}(i)$ , and to identify, among $\mathrm{pa}(i)$ , the variables that are actual causes of $X_i$ in the true causal DAG. The proposed SARTRE model is defined as a special case of a GAM of the form $f_i(X_{\mathrm{pa}(i)}) = \sum_{j \in \mathrm{pa}(i)} g_{i, j}(X_j)$ , with the following shape functions $g_{i,j}$ :

$\displaystyle g_{i, j}(X_j) = \sum_{k=1}^{l_j} \beta_{i, j, k} \cdot \phi_{j, k}(X_j) = \beta_{i, j}^\top \phi_j(X_j)$

Here, $\phi_{j, k}(X_j) := \mathbb{I} \left( X_j \in r_{j, k} \right)$ (with $\phi_j(X_j) = (\phi_{j, 1}(X_j), \dots, \phi_{j, l_j}(X_j))$ ), and $r_{j, k} := (a_{j, k}, b_{j, k}$ ] denotes an interval for the variable $X_j$ . Moreover, $\beta_{i, j} = (\beta_{i, j, 1}, \dots, \beta_{i, j, l_j})^\top$ is the weight vector associated with these intervals. In other words, each shape function is expressed as a linear combination of interval indicator functions, which allows the model to represent nonlinear relationships despite its simple structure. In fact, we can theoretically show that this shape-function class has a universal approximation property (Proposition 1). Furthermore, by definition, if $\beta_{i, j} = 0$ then $g_{i, j}(X_j) = 0$ ; therefore, we can conclude that the variable $X_j$ is not a cause of $X_i$ without going through hypothesis testing.

To learn the SARTRE model, we first need to construct, for each variable $X_j$ , a set of intervals $R_j = { r_{j, 1}, \dots, r_{j, l_j} }$ , and then optimize the weight vector $\beta_i = (\beta_{i, j})_{j \in \mathrm{pa}(i)}$ . In our algorithm, we first construct the interval sets $R_j$ efficiently using a technique called random tree embedding [8]. Random tree embedding generates intervals from split points of randomly generated decision trees, enabling us to efficiently build intervals that adapt to the empirical distribution of each variable. Next, using the constructed interval sets $R_j$ , we optimize $\beta_i$ via Group Lasso regression [9]:

$\displaystyle \hat{\beta}_i = \arg\min_{\beta_i} \left\{\frac{1}{n} \sum_{m=1}^n \left(x_{m, i} - \sum_{j \in \mathrm{pa}(i)} \beta_{i, j}^\top \phi_j(x_{m, j})\right)^2 + \lambda \sum_{j \in \mathrm{pa}(i)} \|\beta_{i, j}\|_2\right\}$

The first term is the standard squared error, and the second term is the Group Lasso regularization term. Due to Group Lasso regularization, the model is learned so that $\beta_{i, j} = 0$ holds for as many explanatory variables $X_j$ as possible. Since the optimization problem above is convex, we can learn the SARTRE model efficiently using existing optimization algorithms. By using SARTRE in place of a GAM, we can identify redundant edges quickly without hypothesis testing, thereby substantially accelerating the pruning step in order-based causal discovery.

Experimental Results

We compared our method against state-of-the-art order-based nonlinear causal discovery methods such as SCORE and its scalable variant DAS, which combine the recent state-of-the-art ordering method with CAM-pruning. We used the same ordering step and replaced CAM-pruning with SARTRE-based pruning. We generated synthetic nonlinear datasets (10 runs per condition) and reported mean and standard deviation of Structural Hamming Distance (SHD), Structural Intervention Distance (SID), and runtime (seconds).

As the number of variables increases, SARTRE significantly reduces runtime compared to baselines, while achieving comparable or better SHD and SID. These results confirm that SARTRE can greatly accelerate the pruning step in order-based causal discovery without sacrificing accuracy.

Conclusion

In this post, we introduced two papers presented at the AAAI 2026 main conference: one on causal discovery over heterogeneous datasets and the other on ultrafast nonlinear causal discovery. We believe these contributions are important steps toward practical causal discovery applicable to complex real-world datasets. We will continue improving these methods and advancing R&D toward real-world decision-making applications.

References
[1] Shohei Shimizu (2017). Statistical Causal Discovery (in Japanese). Kodansha.
[2] Tillman and Spirtes (2011). Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS2011).
[3] Huang et al. (2020). Causal discovery from multiple data sets with non-identical variable sets. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI2020).
[4] Maeda and Shimizu (2021). Causal Additive Models with Unobserved Variables. Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence (UAI2021).
[5] Peters et al. (2014). Causal Discovery with Continuous Additive Noise Models. Journal of Machine Learning Research.
[6] Rolland et al. (2022). Score Matching Enables Causal Discovery of Nonlinear Additive Noise Models. Proceedings of the 39th International Conference on Machine Learning (ICML2022).
[7] Bühlmann et al. (2014). CAM: Causal Additive Models, High-Dimensional Order Search and Penalized Regression. The Annals of Statistics.
[8] Moosmann et al. (2006). Fast discriminative visual codebooks using randomized clustering forests. Advances in Neural Information Processing Systems (NIPS2006).
[9] Yuan and Lin (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
[10] Montagna et al. (2023). Scalable Causal Discovery with Score Matching. Proceedings of the Second Conference on Causal Learning and Reasoning (CLeaR2023).