Skip to main content

Remote Folder (SFTP)

This section explains how to create, configure, and test a Remote Folder data source in DataDios. Remote Folder allows you to connect to file systems on remote servers via SSH/SFTP protocol.


Overview

The Remote Folder data source enables secure access to files and directories on remote Linux/Unix servers. It supports:

  • Multiple authentication methods: Password or SSH private keys (RSA, ECDSA, Ed25519, DSA)
  • Various file formats: CSV, JSON, Parquet, ORC, Avro, XML, TSV, and more
  • Hierarchical browsing: Navigate through folder structures
  • Metadata extraction: View file attributes and data schemas

Steps to Create and Test a Remote Folder Data Source

Step 1: Create a Data Source

  1. Navigate to the Data Sources tab in DataDios
  2. Click + CREATE DS
  3. From the list of available data source types, select Remote Folder

Step 2: Fill Connection Details

In the Connection Details form, provide the required parameters:

Required Parameters

ParameterDescriptionExample
HostRemote server hostname or IP address192.168.1.100 or myserver.example.com
UsernameSSH username for authenticationubuntu, ec2-user
Folder PathAbsolute path to the folder on remote server/home/user/data or /var/data/files

Authentication (Choose One)

Option A: Password Authentication

ParameterDescription
PasswordSSH password for the user

Option B: SSH Private Key Authentication

ParameterDescription
PEM DataSSH private key content (paste the entire key including BEGIN/END headers)
Passphrase(Optional) Passphrase if the private key is encrypted
Supported Key Types

DataDios supports multiple SSH key algorithms:

  • RSA (most common)
  • ECDSA (elliptic curve)
  • Ed25519 (modern, recommended)
  • DSA (legacy)

Optional Parameters

ParameterDescriptionExample
GroupGrouping for organizing data sourcesProduction, Development
Object TypesFilter files by type (comma-separated)CSV,JSON,PARQUET or * for all
Secret NameReference to secret stores (AWS Secrets Manager, Azure Key Vault)my-sftp-credentials

Step 3: Test Connection

  1. After entering details, click Test Connection
  2. Ensure the connection is validated successfully
  3. If using SSH keys, verify the key format is correct (PEM format with proper line breaks)
Troubleshooting Connection Issues
  • Authentication failed: Verify username and password/key are correct
  • Host key verification: DataDios automatically accepts new host keys
  • Permission denied: Ensure the user has read access to the specified folder path
  • Key format error: Make sure the private key includes -----BEGIN ... KEY----- and -----END ... KEY----- headers

Step 4: Save Data Source

  1. If the test succeeds, click Create to save the data source
  2. You will be redirected to the Datasource Listing Page, where the Remote Folder data source will appear

Step 5: Explore Data Source Items

  1. Expand the Remote Folder data source to view all items (folders and files)

  2. The file browser displays:

    • Folders: Click to expand and view contents
    • Files: Shows file name, type, and last modified date
  3. To view metadata about any item:

    • Click the item name
    • Click the three stacked lines icon to open the Object Metadata pop-up
  4. You can also explore additional features in the Metadata Explorer:

    • Object Data

      • View the actual data present in the selected file (e.g., CSV rows, JSON content)
    • Attributes

      • View column names and inferred data types for structured files

Supported File Formats

FormatExtensionFeatures
CSV.csvComma-separated values, auto-detect schema
TSV.tsvTab-separated values
JSON.jsonJSON documents
Parquet.parquetColumnar storage format
ORC.orcOptimized Row Columnar format
Avro.avroApache Avro data serialization
AVSC.avscAvro schema files
XML.xmlXML documents
Text.txt, .textPlain text files

Connection Configuration Examples

Example 1: Password Authentication

{
"host": "192.168.1.100",
"username": "datauser",
"password": "SecurePassword123",
"folder_path": "/home/datauser/datasets",
"object_types": "CSV,PARQUET,JSON"
}

Example 2: SSH Private Key (RSA)

{
"host": "myserver.example.com",
"username": "ubuntu",
"pem_data": "-----BEGIN RSA PRIVATE KEY-----\nMIIEpAIBAAKCAQEA...\n-----END RSA PRIVATE KEY-----",
"folder_path": "/var/data/files",
"object_types": "*"
}

Example 3: SSH Private Key with Passphrase (Ed25519)

{
"host": "secure-server.example.com",
"username": "admin",
"pem_data": "-----BEGIN OPENSSH PRIVATE KEY-----\nb3BlbnNzaC1rZXktdjEAAAA...\n-----END OPENSSH PRIVATE KEY-----",
"passphrase": "my-key-passphrase",
"folder_path": "/data/analytics"
}

Best Practices

  1. Use SSH Keys instead of passwords for enhanced security
  2. Use Ed25519 Keys for modern, secure, and fast authentication
  3. Store Credentials in Secret Stores (AWS Secrets Manager, Azure Key Vault) to avoid hardcoding
  4. Always Test Connection before saving to ensure configuration is correct
  5. Use Specific Object Types to filter only relevant file types and improve performance
  6. Organize with Groups for easier management of multiple Remote Folder data sources
  7. Use Absolute Paths for folder_path to avoid ambiguity

Security Considerations

  • All connections are made over SSH (port 22 by default), ensuring encrypted data transfer
  • Private keys are stored securely and never exposed in logs
  • Use encrypted private keys with passphrases for additional security
  • Consider using bastion hosts or jump servers for accessing servers in private networks

Generating SSH Keys

Linux / macOS

Generate Ed25519 Key (Recommended)

ssh-keygen -t ed25519 -C "your-email@example.com"

Generate RSA Key

ssh-keygen -t rsa -b 4096 -C "your-email@example.com"

Copy Public Key to Server

ssh-copy-id username@remote-server

View Private Key (for pasting into DataDios)

cat ~/.ssh/id_ed25519

Windows

Option 1: Using PowerShell (Windows 10/11)

# Generate Ed25519 key
ssh-keygen -t ed25519 -C "your-email@example.com"

# Generate RSA key
ssh-keygen -t rsa -b 4096 -C "your-email@example.com"

Keys are saved to: C:\Users\YourUsername\.ssh\

Option 2: Using Git Bash

ssh-keygen -t ed25519 -C "your-email@example.com"

After Generating Keys

  1. Copy public key to server - Add contents of id_ed25519.pub (or id_rsa.pub) to server's ~/.ssh/authorized_keys

  2. Use private key in DataDios - Copy contents of id_ed25519 (or id_rsa) and paste into PEM Data field

Example: View and copy private key

# Linux/macOS/Git Bash
cat ~/.ssh/id_ed25519

# Windows PowerShell
Get-Content $env:USERPROFILE\.ssh\id_ed25519
tip

When prompted during key generation:

  • File location: Press Enter to use default
  • Passphrase: Optional but recommended for extra security

Troubleshooting

Common Issues

IssuePossible CauseSolution
Connection timeoutServer unreachable or firewall blockingVerify network connectivity and firewall rules
Authentication failedWrong credentialsDouble-check username and password/key
Permission deniedUser lacks read accessEnsure user has permissions on the folder
Key format errorInvalid PEM formatVerify key has proper headers and line breaks
File not foundPath doesn't existVerify folder_path exists on the remote server

Key Format Tips

When pasting SSH private keys, ensure:

  • The key includes -----BEGIN ... KEY----- header
  • The key includes -----END ... KEY----- footer
  • Line breaks are preserved (if pasting from web form, the system will auto-fix)

For more details on configuring data sources with secret stores, see the Secret Stores documentation.