Synthetic Data Tutorial: Exploring Dynamic Correlations#

This tutorial demonstrates how to generate and analyze synthetic time-series data using the timecorr toolbox. We’ll explore different types of synthetic data generation methods and show how to compute dynamic correlations with various kernel functions.

Introduction#

Dynamic correlations capture how relationships between variables change over time. The timecorr toolbox provides powerful tools for:

  • Computing moment-by-moment correlations

  • Exploring higher-order correlation structure

  • Analyzing time-varying connectivity patterns

This tutorial will walk you through generating synthetic datasets with known correlation structures and analyzing them using timecorr.

[ ]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# Import timecorr
import timecorr as tc

# Set random seed for reproducibility
np.random.seed(42)

# Set up plotting
plt.style.use('seaborn-v0_8')
sns.set_palette('husl')
%matplotlib inline

1. Synthetic Data Generation#

The timecorr toolbox provides several methods for generating synthetic datasets with different correlation structures. Let’s explore each type:

[ ]:
# Parameters for synthetic data generation
T = 200  # Number of time points
K = 10   # Number of features/variables
B = 5    # Number of blocks (for block dataset)

# Generate different types of synthetic data
datasets = {}

# 1. Random dataset - uncorrelated Gaussian noise
datasets['random'] = tc.simulate_data(datagen='random', T=T, K=K, set_random_seed=42)

# 2. Constant dataset - stable correlations over time
datasets['constant'] = tc.simulate_data(datagen='constant', T=T, K=K, set_random_seed=42)

# 3. Block dataset - correlation structure changes in blocks
datasets['block'] = tc.simulate_data(datagen='block', T=T, K=K, B=B, set_random_seed=42)

# 4. Ramping dataset - gradual changes in correlation structure
datasets['ramping'] = tc.simulate_data(datagen='ramping', T=T, K=K, set_random_seed=42)

print("Generated datasets:")
for name, data in datasets.items():
    print(f"{name}: shape {data.shape}")

2. Visualizing Synthetic Data#

Let’s visualize the different types of synthetic data to understand their characteristics:

[ ]:
# Create a figure to visualize all datasets
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
axes = axes.flatten()

for i, (name, data) in enumerate(datasets.items()):
    ax = axes[i]

    # Plot the first few features over time
    for j in range(min(5, K)):
        ax.plot(data[:, j], alpha=0.7, label=f'Feature {j+1}')

    ax.set_title(f'{name.title()} Dataset', fontsize=14, fontweight='bold')
    ax.set_xlabel('Time')
    ax.set_ylabel('Value')
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Show correlation matrices for each dataset type
fig, axes = plt.subplots(1, 4, figsize=(20, 4))

for i, (name, data) in enumerate(datasets.items()):
    # Compute static correlation matrix
    corr_matrix = np.corrcoef(data.T)

    # Plot correlation matrix
    sns.heatmap(corr_matrix, annot=False, cmap='coolwarm', center=0,
                vmin=-1, vmax=1, ax=axes[i])
    axes[i].set_title(f'{name.title()} Correlations')

plt.tight_layout()
plt.show()

3. Kernel Functions for Dynamic Correlations#

The timecorr toolbox provides several kernel functions that determine how much each timepoint contributes to the correlation at each moment. Let’s explore the different kernel options:

[ ]:
# Define different kernel functions and their parameters
T_kernel = 50  # Number of timepoints for kernel visualization

kernels = {
    'Delta (Eye)': {'func': tc.eye_weights, 'params': {}},
    'Gaussian': {'func': tc.gaussian_weights, 'params': {'var': 100}},
    'Laplace': {'func': tc.laplace_weights, 'params': {'scale': 10}},
    'Mexican Hat': {'func': tc.mexican_hat_weights, 'params': {'sigma': 5}},
    'Boxcar': {'func': tc.boxcar_weights, 'params': {'width': 20}},
    'Uniform': {'func': tc.uniform_weights, 'params': {}}
}

# Generate and visualize kernel functions
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
axes = axes.flatten()

for i, (name, kernel_info) in enumerate(kernels.items()):
    # Generate kernel weights
    weights = kernel_info['func'](T_kernel, kernel_info['params'])

    # Plot kernel as heatmap
    im = axes[i].imshow(weights, cmap='viridis', aspect='auto')
    axes[i].set_title(f'{name} Kernel', fontsize=12, fontweight='bold')
    axes[i].set_xlabel('Time')
    axes[i].set_ylabel('Time')

    # Add colorbar
    plt.colorbar(im, ax=axes[i], shrink=0.8)

plt.tight_layout()
plt.show()

# Show kernel profiles (central timepoint)
fig, ax = plt.subplots(figsize=(12, 6))
center_point = T_kernel // 2

for name, kernel_info in kernels.items():
    weights = kernel_info['func'](T_kernel, kernel_info['params'])
    ax.plot(weights[center_point, :], label=name, linewidth=2)

ax.set_xlabel('Time Offset')
ax.set_ylabel('Weight')
ax.set_title('Kernel Function Profiles', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()

4. Computing Dynamic Correlations#

Now let’s compute dynamic correlations using different kernel functions on our synthetic datasets:

[ ]:
# Choose a dataset for dynamic correlation analysis
data = datasets['block']  # Block dataset shows clear temporal structure

# Compute dynamic correlations with different kernels
dynamic_corrs = {}

# Use a subset of kernels for clarity
selected_kernels = ['Delta (Eye)', 'Gaussian', 'Laplace', 'Mexican Hat']

for kernel_name in selected_kernels:
    kernel_info = kernels[kernel_name]

    # Compute dynamic correlations
    dynamic_corrs[kernel_name] = tc.timecorr(
        data,
        weights_function=kernel_info['func'],
        weights_params=kernel_info['params']
    )

    print(f"{kernel_name}: Dynamic correlations shape {dynamic_corrs[kernel_name].shape}")

print(f"\nOriginal data shape: {data.shape}")
print(f"Dynamic correlations capture {data.shape[1]} variables over {data.shape[0]} timepoints")
print(f"Output has {dynamic_corrs['Gaussian'].shape[1]} features (vectorized correlation matrices)")

5. Visualizing Dynamic Correlations#

Let’s visualize how dynamic correlations change over time for different kernel functions:

[ ]:
# Convert vectorized correlations back to matrix format for visualization
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
axes = axes.flatten()

for i, kernel_name in enumerate(selected_kernels):
    # Convert back to matrix format
    corr_matrices = tc.vec2mat(dynamic_corrs[kernel_name])

    # Extract correlation between features 0 and 1 over time
    corr_01 = corr_matrices[0, 1, :]

    # Plot the dynamic correlation
    axes[i].plot(corr_01, linewidth=2, color='blue')
    axes[i].set_title(f'{kernel_name} Kernel\nDynamic Correlation (Features 0-1)',
                     fontsize=12, fontweight='bold')
    axes[i].set_xlabel('Time')
    axes[i].set_ylabel('Correlation')
    axes[i].grid(True, alpha=0.3)
    axes[i].set_ylim([-1, 1])

plt.tight_layout()
plt.show()

# Show correlation matrices at different time points
time_points = [T//4, T//2, 3*T//4]  # Three time points
fig, axes = plt.subplots(len(time_points), len(selected_kernels), figsize=(16, 12))

for i, kernel_name in enumerate(selected_kernels):
    corr_matrices = tc.vec2mat(dynamic_corrs[kernel_name])

    for j, t in enumerate(time_points):
        # Plot correlation matrix at time t
        sns.heatmap(corr_matrices[:, :, t], annot=False, cmap='coolwarm',
                    center=0, vmin=-1, vmax=1, ax=axes[j, i],
                    cbar=True if i == len(selected_kernels)-1 else False)

        if j == 0:
            axes[j, i].set_title(f'{kernel_name}')
        if i == 0:
            axes[j, i].set_ylabel(f'Time {t}')

plt.tight_layout()
plt.show()

6. Higher-Order Correlations#

One of the powerful features of timecorr is the ability to compute higher-order correlations - correlations between correlations. Let’s explore this:

[ ]:
# Compute higher-order correlations using dimensionality reduction
data = datasets['ramping']  # Use ramping dataset for gradual changes

# Level 1: Standard dynamic correlations (correlations)
level1 = tc.timecorr(
    data,
    weights_function=tc.gaussian_weights,
    weights_params={'var': 50}
)

# Level 2: Correlations between correlations
level2 = tc.timecorr(
    data,
    weights_function=tc.gaussian_weights,
    weights_params={'var': 50},
    rfun='PCA'  # Use PCA to reduce dimensionality
)

# Level 3: Correlations between correlations between correlations
level3 = tc.timecorr(
    level2,
    weights_function=tc.gaussian_weights,
    weights_params={'var': 50},
    rfun='PCA'
)

print(f"Original data shape: {data.shape}")
print(f"Level 1 correlations shape: {level1.shape}")
print(f"Level 2 correlations shape: {level2.shape}")
print(f"Level 3 correlations shape: {level3.shape}")

# Visualize higher-order correlations
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Plot first principal component of each level
axes[0].plot(level1[:, 0], linewidth=2, color='blue')
axes[0].set_title('Level 1: First Correlation Component')
axes[0].set_xlabel('Time')
axes[0].set_ylabel('Value')
axes[0].grid(True, alpha=0.3)

axes[1].plot(level2[:, 0], linewidth=2, color='red')
axes[1].set_title('Level 2: First Component')
axes[1].set_xlabel('Time')
axes[1].set_ylabel('Value')
axes[1].grid(True, alpha=0.3)

axes[2].plot(level3[:, 0], linewidth=2, color='green')
axes[2].set_title('Level 3: First Component')
axes[2].set_xlabel('Time')
axes[2].set_ylabel('Value')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

7. Multi-Subject Analysis#

Timecorr can also analyze data from multiple subjects/sessions. Let’s demonstrate inter-subject functional connectivity (ISFC):

[ ]:
# Generate multi-subject data
n_subjects = 5
multi_subject_data = []

for i in range(n_subjects):
    # Generate block data with some individual differences
    subject_data = tc.simulate_data(
        datagen='block',
        T=T,
        K=K,
        B=B,
        set_random_seed=42 + i
    )
    multi_subject_data.append(subject_data)

print(f"Generated data for {n_subjects} subjects")
print(f"Each subject has data shape: {multi_subject_data[0].shape}")

# Compute inter-subject functional connectivity (ISFC)
isfc_result = tc.timecorr(
    multi_subject_data,
    weights_function=tc.gaussian_weights,
    weights_params={'var': 100},
    cfun=tc.isfc  # Inter-subject functional connectivity
)

print(f"\nISFC result shape: {len(isfc_result)} subjects, each with shape {isfc_result[0].shape}")

# Compute within-subject correlations for comparison
within_subject_results = []
for subject_data in multi_subject_data:
    within_corr = tc.timecorr(
        subject_data,
        weights_function=tc.gaussian_weights,
        weights_params={'var': 100}
    )
    within_subject_results.append(within_corr)

# Visualize comparison between within-subject and ISFC
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Plot within-subject correlations
for i, within_corr in enumerate(within_subject_results):
    within_matrices = tc.vec2mat(within_corr)
    corr_01 = within_matrices[0, 1, :]
    axes[0].plot(corr_01, alpha=0.7, label=f'Subject {i+1}')

axes[0].set_title('Within-Subject Correlations\n(Features 0-1)')
axes[0].set_xlabel('Time')
axes[0].set_ylabel('Correlation')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
axes[0].set_ylim([-1, 1])

# Plot ISFC correlations
for i, isfc_corr in enumerate(isfc_result):
    isfc_matrices = tc.vec2mat(isfc_corr)
    corr_01 = isfc_matrices[0, 1, :]
    axes[1].plot(corr_01, alpha=0.7, label=f'Subject {i+1}')

axes[1].set_title('Inter-Subject Functional Connectivity\n(Features 0-1)')
axes[1].set_xlabel('Time')
axes[1].set_ylabel('Correlation')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
axes[1].set_ylim([-1, 1])

plt.tight_layout()
plt.show()

8. Statistical Analysis and Significance Testing#

Let’s perform some statistical analyses on our dynamic correlations:

[ ]:
# Generate multiple datasets for statistical analysis
n_datasets = 20
correlation_timeseries = []

for i in range(n_datasets):
    # Generate block dataset
    data = tc.simulate_data(
        datagen='block',
        T=T,
        K=K,
        B=B,
        set_random_seed=i
    )

    # Compute dynamic correlations
    dynamic_corr = tc.timecorr(
        data,
        weights_function=tc.gaussian_weights,
        weights_params={'var': 50}
    )

    # Convert to matrix format and extract correlation between features 0 and 1
    corr_matrices = tc.vec2mat(dynamic_corr)
    corr_01 = corr_matrices[0, 1, :]
    correlation_timeseries.append(corr_01)

# Convert to numpy array for analysis
correlation_array = np.array(correlation_timeseries)
print(f"Correlation array shape: {correlation_array.shape}")

# Compute statistics across datasets
mean_correlation = np.mean(correlation_array, axis=0)
std_correlation = np.std(correlation_array, axis=0)
sem_correlation = stats.sem(correlation_array, axis=0)

# Perform t-tests at each timepoint
t_stats, p_values = stats.ttest_1samp(correlation_array, 0, axis=0)

# Apply multiple comparisons correction (Bonferroni)
corrected_p_values = p_values * len(p_values)
corrected_p_values = np.minimum(corrected_p_values, 1.0)

# Find significant timepoints
significant_timepoints = corrected_p_values < 0.05

# Visualize statistical results
fig, axes = plt.subplots(2, 1, figsize=(15, 10))

# Plot correlation timeseries with confidence intervals
time_points = np.arange(T)
axes[0].fill_between(time_points,
                     mean_correlation - 1.96 * sem_correlation,
                     mean_correlation + 1.96 * sem_correlation,
                     alpha=0.3, color='blue', label='95% CI')
axes[0].plot(time_points, mean_correlation, 'b-', linewidth=2, label='Mean Correlation')
axes[0].axhline(y=0, color='k', linestyle='--', alpha=0.5)
axes[0].set_title('Dynamic Correlation with 95% Confidence Interval', fontsize=14)
axes[0].set_xlabel('Time')
axes[0].set_ylabel('Correlation')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot p-values and significance
axes[1].plot(time_points, -np.log10(p_values), 'r-', linewidth=2, label='Uncorrected p-values')
axes[1].plot(time_points, -np.log10(corrected_p_values), 'g-', linewidth=2, label='Bonferroni corrected')
axes[1].axhline(y=-np.log10(0.05), color='k', linestyle='--', alpha=0.5, label='p = 0.05')
axes[1].fill_between(time_points, 0, -np.log10(0.05),
                     where=significant_timepoints, alpha=0.3, color='green', label='Significant')
axes[1].set_title('Statistical Significance Over Time', fontsize=14)
axes[1].set_xlabel('Time')
axes[1].set_ylabel('-log10(p-value)')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Number of significant timepoints: {np.sum(significant_timepoints)} out of {T}")
print(f"Percentage of significant timepoints: {np.sum(significant_timepoints)/T*100:.1f}%")

9. Summary and Best Practices#

This tutorial has demonstrated several key concepts in dynamic correlation analysis:

Key Takeaways:#

  1. Synthetic Data Generation: Use different data generation methods to test your analysis pipeline

  2. Kernel Functions: Choose appropriate kernel functions based on your temporal resolution needs

  3. Higher-Order Correlations: Explore correlations between correlations for deeper insights

  4. Multi-Subject Analysis: Use ISFC to find shared patterns across subjects

  5. Statistical Testing: Always perform appropriate statistical tests and corrections

Best Practices:#

  • Validate with synthetic data: Always test your analysis pipeline on synthetic data with known ground truth

  • Choose appropriate kernels: Match your kernel function to your temporal resolution requirements

  • Consider multiple comparison corrections: When testing across many timepoints, use appropriate corrections

  • Visualize your results: Always plot your data and results to understand what’s happening

  • Cross-validate: Use different synthetic data types to ensure robustness

Next Steps:#

  • Apply these techniques to your real data

  • Explore different kernel parameters for your specific use case

  • Consider advanced techniques like weighted inter-subject functional connectivity (WISFC)

  • Investigate graph-theoretic measures for network analysis

[ ]:
# Final demonstration: comparing all synthetic data types
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
axes = axes.flatten()

for i, (name, data) in enumerate(datasets.items()):
    # Compute dynamic correlations
    dynamic_corr = tc.timecorr(
        data,
        weights_function=tc.gaussian_weights,
        weights_params={'var': 50}
    )

    # Convert to matrix format and extract correlation between features 0 and 1
    corr_matrices = tc.vec2mat(dynamic_corr)
    corr_01 = corr_matrices[0, 1, :]

    # Plot dynamic correlation
    axes[i].plot(corr_01, linewidth=2, color='blue')
    axes[i].set_title(f'{name.title()} Dataset\nDynamic Correlation (Features 0-1)',
                     fontsize=12, fontweight='bold')
    axes[i].set_xlabel('Time')
    axes[i].set_ylabel('Correlation')
    axes[i].grid(True, alpha=0.3)
    axes[i].set_ylim([-1, 1])

plt.tight_layout()
plt.show()

print("Tutorial completed! You now know how to:")
print("✓ Generate synthetic data with different correlation structures")
print("✓ Choose and apply appropriate kernel functions")
print("✓ Compute dynamic correlations and higher-order correlations")
print("✓ Analyze multi-subject data with ISFC")
print("✓ Perform statistical testing on dynamic correlations")
print("✓ Visualize and interpret your results")