Synthetic Data Tutorial: Exploring Dynamic Correlations#
This tutorial demonstrates how to generate and analyze synthetic time-series data using the timecorr toolbox. We’ll explore different types of synthetic data generation methods and show how to compute dynamic correlations with various kernel functions.
Introduction#
Dynamic correlations capture how relationships between variables change over time. The timecorr toolbox provides powerful tools for:
Computing moment-by-moment correlations
Exploring higher-order correlation structure
Analyzing time-varying connectivity patterns
This tutorial will walk you through generating synthetic datasets with known correlation structures and analyzing them using timecorr.
[ ]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')
# Import timecorr
import timecorr as tc
# Set random seed for reproducibility
np.random.seed(42)
# Set up plotting
plt.style.use('seaborn-v0_8')
sns.set_palette('husl')
%matplotlib inline
1. Synthetic Data Generation#
The timecorr toolbox provides several methods for generating synthetic datasets with different correlation structures. Let’s explore each type:
[ ]:
# Parameters for synthetic data generation
T = 200 # Number of time points
K = 10 # Number of features/variables
B = 5 # Number of blocks (for block dataset)
# Generate different types of synthetic data
datasets = {}
# 1. Random dataset - uncorrelated Gaussian noise
datasets['random'] = tc.simulate_data(datagen='random', T=T, K=K, set_random_seed=42)
# 2. Constant dataset - stable correlations over time
datasets['constant'] = tc.simulate_data(datagen='constant', T=T, K=K, set_random_seed=42)
# 3. Block dataset - correlation structure changes in blocks
datasets['block'] = tc.simulate_data(datagen='block', T=T, K=K, B=B, set_random_seed=42)
# 4. Ramping dataset - gradual changes in correlation structure
datasets['ramping'] = tc.simulate_data(datagen='ramping', T=T, K=K, set_random_seed=42)
print("Generated datasets:")
for name, data in datasets.items():
print(f"{name}: shape {data.shape}")
2. Visualizing Synthetic Data#
Let’s visualize the different types of synthetic data to understand their characteristics:
[ ]:
# Create a figure to visualize all datasets
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
axes = axes.flatten()
for i, (name, data) in enumerate(datasets.items()):
ax = axes[i]
# Plot the first few features over time
for j in range(min(5, K)):
ax.plot(data[:, j], alpha=0.7, label=f'Feature {j+1}')
ax.set_title(f'{name.title()} Dataset', fontsize=14, fontweight='bold')
ax.set_xlabel('Time')
ax.set_ylabel('Value')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Show correlation matrices for each dataset type
fig, axes = plt.subplots(1, 4, figsize=(20, 4))
for i, (name, data) in enumerate(datasets.items()):
# Compute static correlation matrix
corr_matrix = np.corrcoef(data.T)
# Plot correlation matrix
sns.heatmap(corr_matrix, annot=False, cmap='coolwarm', center=0,
vmin=-1, vmax=1, ax=axes[i])
axes[i].set_title(f'{name.title()} Correlations')
plt.tight_layout()
plt.show()
3. Kernel Functions for Dynamic Correlations#
The timecorr toolbox provides several kernel functions that determine how much each timepoint contributes to the correlation at each moment. Let’s explore the different kernel options:
[ ]:
# Define different kernel functions and their parameters
T_kernel = 50 # Number of timepoints for kernel visualization
kernels = {
'Delta (Eye)': {'func': tc.eye_weights, 'params': {}},
'Gaussian': {'func': tc.gaussian_weights, 'params': {'var': 100}},
'Laplace': {'func': tc.laplace_weights, 'params': {'scale': 10}},
'Mexican Hat': {'func': tc.mexican_hat_weights, 'params': {'sigma': 5}},
'Boxcar': {'func': tc.boxcar_weights, 'params': {'width': 20}},
'Uniform': {'func': tc.uniform_weights, 'params': {}}
}
# Generate and visualize kernel functions
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
axes = axes.flatten()
for i, (name, kernel_info) in enumerate(kernels.items()):
# Generate kernel weights
weights = kernel_info['func'](T_kernel, kernel_info['params'])
# Plot kernel as heatmap
im = axes[i].imshow(weights, cmap='viridis', aspect='auto')
axes[i].set_title(f'{name} Kernel', fontsize=12, fontweight='bold')
axes[i].set_xlabel('Time')
axes[i].set_ylabel('Time')
# Add colorbar
plt.colorbar(im, ax=axes[i], shrink=0.8)
plt.tight_layout()
plt.show()
# Show kernel profiles (central timepoint)
fig, ax = plt.subplots(figsize=(12, 6))
center_point = T_kernel // 2
for name, kernel_info in kernels.items():
weights = kernel_info['func'](T_kernel, kernel_info['params'])
ax.plot(weights[center_point, :], label=name, linewidth=2)
ax.set_xlabel('Time Offset')
ax.set_ylabel('Weight')
ax.set_title('Kernel Function Profiles', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()
4. Computing Dynamic Correlations#
Now let’s compute dynamic correlations using different kernel functions on our synthetic datasets:
[ ]:
# Choose a dataset for dynamic correlation analysis
data = datasets['block'] # Block dataset shows clear temporal structure
# Compute dynamic correlations with different kernels
dynamic_corrs = {}
# Use a subset of kernels for clarity
selected_kernels = ['Delta (Eye)', 'Gaussian', 'Laplace', 'Mexican Hat']
for kernel_name in selected_kernels:
kernel_info = kernels[kernel_name]
# Compute dynamic correlations
dynamic_corrs[kernel_name] = tc.timecorr(
data,
weights_function=kernel_info['func'],
weights_params=kernel_info['params']
)
print(f"{kernel_name}: Dynamic correlations shape {dynamic_corrs[kernel_name].shape}")
print(f"\nOriginal data shape: {data.shape}")
print(f"Dynamic correlations capture {data.shape[1]} variables over {data.shape[0]} timepoints")
print(f"Output has {dynamic_corrs['Gaussian'].shape[1]} features (vectorized correlation matrices)")
5. Visualizing Dynamic Correlations#
Let’s visualize how dynamic correlations change over time for different kernel functions:
[ ]:
# Convert vectorized correlations back to matrix format for visualization
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
axes = axes.flatten()
for i, kernel_name in enumerate(selected_kernels):
# Convert back to matrix format
corr_matrices = tc.vec2mat(dynamic_corrs[kernel_name])
# Extract correlation between features 0 and 1 over time
corr_01 = corr_matrices[0, 1, :]
# Plot the dynamic correlation
axes[i].plot(corr_01, linewidth=2, color='blue')
axes[i].set_title(f'{kernel_name} Kernel\nDynamic Correlation (Features 0-1)',
fontsize=12, fontweight='bold')
axes[i].set_xlabel('Time')
axes[i].set_ylabel('Correlation')
axes[i].grid(True, alpha=0.3)
axes[i].set_ylim([-1, 1])
plt.tight_layout()
plt.show()
# Show correlation matrices at different time points
time_points = [T//4, T//2, 3*T//4] # Three time points
fig, axes = plt.subplots(len(time_points), len(selected_kernels), figsize=(16, 12))
for i, kernel_name in enumerate(selected_kernels):
corr_matrices = tc.vec2mat(dynamic_corrs[kernel_name])
for j, t in enumerate(time_points):
# Plot correlation matrix at time t
sns.heatmap(corr_matrices[:, :, t], annot=False, cmap='coolwarm',
center=0, vmin=-1, vmax=1, ax=axes[j, i],
cbar=True if i == len(selected_kernels)-1 else False)
if j == 0:
axes[j, i].set_title(f'{kernel_name}')
if i == 0:
axes[j, i].set_ylabel(f'Time {t}')
plt.tight_layout()
plt.show()
6. Higher-Order Correlations#
One of the powerful features of timecorr is the ability to compute higher-order correlations - correlations between correlations. Let’s explore this:
[ ]:
# Compute higher-order correlations using dimensionality reduction
data = datasets['ramping'] # Use ramping dataset for gradual changes
# Level 1: Standard dynamic correlations (correlations)
level1 = tc.timecorr(
data,
weights_function=tc.gaussian_weights,
weights_params={'var': 50}
)
# Level 2: Correlations between correlations
level2 = tc.timecorr(
data,
weights_function=tc.gaussian_weights,
weights_params={'var': 50},
rfun='PCA' # Use PCA to reduce dimensionality
)
# Level 3: Correlations between correlations between correlations
level3 = tc.timecorr(
level2,
weights_function=tc.gaussian_weights,
weights_params={'var': 50},
rfun='PCA'
)
print(f"Original data shape: {data.shape}")
print(f"Level 1 correlations shape: {level1.shape}")
print(f"Level 2 correlations shape: {level2.shape}")
print(f"Level 3 correlations shape: {level3.shape}")
# Visualize higher-order correlations
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
# Plot first principal component of each level
axes[0].plot(level1[:, 0], linewidth=2, color='blue')
axes[0].set_title('Level 1: First Correlation Component')
axes[0].set_xlabel('Time')
axes[0].set_ylabel('Value')
axes[0].grid(True, alpha=0.3)
axes[1].plot(level2[:, 0], linewidth=2, color='red')
axes[1].set_title('Level 2: First Component')
axes[1].set_xlabel('Time')
axes[1].set_ylabel('Value')
axes[1].grid(True, alpha=0.3)
axes[2].plot(level3[:, 0], linewidth=2, color='green')
axes[2].set_title('Level 3: First Component')
axes[2].set_xlabel('Time')
axes[2].set_ylabel('Value')
axes[2].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
7. Multi-Subject Analysis#
Timecorr can also analyze data from multiple subjects/sessions. Let’s demonstrate inter-subject functional connectivity (ISFC):
[ ]:
# Generate multi-subject data
n_subjects = 5
multi_subject_data = []
for i in range(n_subjects):
# Generate block data with some individual differences
subject_data = tc.simulate_data(
datagen='block',
T=T,
K=K,
B=B,
set_random_seed=42 + i
)
multi_subject_data.append(subject_data)
print(f"Generated data for {n_subjects} subjects")
print(f"Each subject has data shape: {multi_subject_data[0].shape}")
# Compute inter-subject functional connectivity (ISFC)
isfc_result = tc.timecorr(
multi_subject_data,
weights_function=tc.gaussian_weights,
weights_params={'var': 100},
cfun=tc.isfc # Inter-subject functional connectivity
)
print(f"\nISFC result shape: {len(isfc_result)} subjects, each with shape {isfc_result[0].shape}")
# Compute within-subject correlations for comparison
within_subject_results = []
for subject_data in multi_subject_data:
within_corr = tc.timecorr(
subject_data,
weights_function=tc.gaussian_weights,
weights_params={'var': 100}
)
within_subject_results.append(within_corr)
# Visualize comparison between within-subject and ISFC
fig, axes = plt.subplots(1, 2, figsize=(15, 6))
# Plot within-subject correlations
for i, within_corr in enumerate(within_subject_results):
within_matrices = tc.vec2mat(within_corr)
corr_01 = within_matrices[0, 1, :]
axes[0].plot(corr_01, alpha=0.7, label=f'Subject {i+1}')
axes[0].set_title('Within-Subject Correlations\n(Features 0-1)')
axes[0].set_xlabel('Time')
axes[0].set_ylabel('Correlation')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
axes[0].set_ylim([-1, 1])
# Plot ISFC correlations
for i, isfc_corr in enumerate(isfc_result):
isfc_matrices = tc.vec2mat(isfc_corr)
corr_01 = isfc_matrices[0, 1, :]
axes[1].plot(corr_01, alpha=0.7, label=f'Subject {i+1}')
axes[1].set_title('Inter-Subject Functional Connectivity\n(Features 0-1)')
axes[1].set_xlabel('Time')
axes[1].set_ylabel('Correlation')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
axes[1].set_ylim([-1, 1])
plt.tight_layout()
plt.show()
8. Statistical Analysis and Significance Testing#
Let’s perform some statistical analyses on our dynamic correlations:
[ ]:
# Generate multiple datasets for statistical analysis
n_datasets = 20
correlation_timeseries = []
for i in range(n_datasets):
# Generate block dataset
data = tc.simulate_data(
datagen='block',
T=T,
K=K,
B=B,
set_random_seed=i
)
# Compute dynamic correlations
dynamic_corr = tc.timecorr(
data,
weights_function=tc.gaussian_weights,
weights_params={'var': 50}
)
# Convert to matrix format and extract correlation between features 0 and 1
corr_matrices = tc.vec2mat(dynamic_corr)
corr_01 = corr_matrices[0, 1, :]
correlation_timeseries.append(corr_01)
# Convert to numpy array for analysis
correlation_array = np.array(correlation_timeseries)
print(f"Correlation array shape: {correlation_array.shape}")
# Compute statistics across datasets
mean_correlation = np.mean(correlation_array, axis=0)
std_correlation = np.std(correlation_array, axis=0)
sem_correlation = stats.sem(correlation_array, axis=0)
# Perform t-tests at each timepoint
t_stats, p_values = stats.ttest_1samp(correlation_array, 0, axis=0)
# Apply multiple comparisons correction (Bonferroni)
corrected_p_values = p_values * len(p_values)
corrected_p_values = np.minimum(corrected_p_values, 1.0)
# Find significant timepoints
significant_timepoints = corrected_p_values < 0.05
# Visualize statistical results
fig, axes = plt.subplots(2, 1, figsize=(15, 10))
# Plot correlation timeseries with confidence intervals
time_points = np.arange(T)
axes[0].fill_between(time_points,
mean_correlation - 1.96 * sem_correlation,
mean_correlation + 1.96 * sem_correlation,
alpha=0.3, color='blue', label='95% CI')
axes[0].plot(time_points, mean_correlation, 'b-', linewidth=2, label='Mean Correlation')
axes[0].axhline(y=0, color='k', linestyle='--', alpha=0.5)
axes[0].set_title('Dynamic Correlation with 95% Confidence Interval', fontsize=14)
axes[0].set_xlabel('Time')
axes[0].set_ylabel('Correlation')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# Plot p-values and significance
axes[1].plot(time_points, -np.log10(p_values), 'r-', linewidth=2, label='Uncorrected p-values')
axes[1].plot(time_points, -np.log10(corrected_p_values), 'g-', linewidth=2, label='Bonferroni corrected')
axes[1].axhline(y=-np.log10(0.05), color='k', linestyle='--', alpha=0.5, label='p = 0.05')
axes[1].fill_between(time_points, 0, -np.log10(0.05),
where=significant_timepoints, alpha=0.3, color='green', label='Significant')
axes[1].set_title('Statistical Significance Over Time', fontsize=14)
axes[1].set_xlabel('Time')
axes[1].set_ylabel('-log10(p-value)')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print(f"Number of significant timepoints: {np.sum(significant_timepoints)} out of {T}")
print(f"Percentage of significant timepoints: {np.sum(significant_timepoints)/T*100:.1f}%")
9. Summary and Best Practices#
This tutorial has demonstrated several key concepts in dynamic correlation analysis:
Key Takeaways:#
Synthetic Data Generation: Use different data generation methods to test your analysis pipeline
Kernel Functions: Choose appropriate kernel functions based on your temporal resolution needs
Higher-Order Correlations: Explore correlations between correlations for deeper insights
Multi-Subject Analysis: Use ISFC to find shared patterns across subjects
Statistical Testing: Always perform appropriate statistical tests and corrections
Best Practices:#
Validate with synthetic data: Always test your analysis pipeline on synthetic data with known ground truth
Choose appropriate kernels: Match your kernel function to your temporal resolution requirements
Consider multiple comparison corrections: When testing across many timepoints, use appropriate corrections
Visualize your results: Always plot your data and results to understand what’s happening
Cross-validate: Use different synthetic data types to ensure robustness
Next Steps:#
Apply these techniques to your real data
Explore different kernel parameters for your specific use case
Consider advanced techniques like weighted inter-subject functional connectivity (WISFC)
Investigate graph-theoretic measures for network analysis
[ ]:
# Final demonstration: comparing all synthetic data types
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
axes = axes.flatten()
for i, (name, data) in enumerate(datasets.items()):
# Compute dynamic correlations
dynamic_corr = tc.timecorr(
data,
weights_function=tc.gaussian_weights,
weights_params={'var': 50}
)
# Convert to matrix format and extract correlation between features 0 and 1
corr_matrices = tc.vec2mat(dynamic_corr)
corr_01 = corr_matrices[0, 1, :]
# Plot dynamic correlation
axes[i].plot(corr_01, linewidth=2, color='blue')
axes[i].set_title(f'{name.title()} Dataset\nDynamic Correlation (Features 0-1)',
fontsize=12, fontweight='bold')
axes[i].set_xlabel('Time')
axes[i].set_ylabel('Correlation')
axes[i].grid(True, alpha=0.3)
axes[i].set_ylim([-1, 1])
plt.tight_layout()
plt.show()
print("Tutorial completed! You now know how to:")
print("✓ Generate synthetic data with different correlation structures")
print("✓ Choose and apply appropriate kernel functions")
print("✓ Compute dynamic correlations and higher-order correlations")
print("✓ Analyze multi-subject data with ISFC")
print("✓ Perform statistical testing on dynamic correlations")
print("✓ Visualize and interpret your results")