Listing Thumbnail

    De-ID Studio: Document De-Identification for AI and Analytics Pipelines

     Info
    Sold by: Blue Peak 
    Deployed on AWS
    Strip names, IDs, and contact details from text, CSV, and PDF files inside your AWS account before those documents feed RAG, fine-tuning, or analytics jobs.

    Overview

    Most AI programs eventually touch documents that were never meant for a model context: clinical notes, matter files, HR exports, customer correspondence. De-ID Studio exists for the gap between "files in a bucket" and "safe to embed." It is a batch processor, not an API gateway - you run a task, it finishes, it exits with a clear status code.

    Typical buyers include platform engineers wiring document prep into MLOps, compliance leads who want evidence without another SaaS data processor, and legal or healthcare-adjacent teams preparing corpora for retrieval-augmented generation. The tool supports your compliance program; it does not certify HIPAA, GDPR, or Safe Harbor outcomes on your behalf. You decide whether content is PHI or PII and whether output meets your policy.

    A single run lists input objects (local path or S3 prefix), detects identifier spans, resolves overlaps, transforms text per category strategy, and writes outputs beside an audit record. Audit entries report counts and strategies - never the original matched strings. Pseudonymization, when enabled, stores an encrypted mapping file at a separate path you configure; it is not co-located with de-identified documents or the audit log.

    Formats in v1: plain text, CSV (cell-aware), and PDF with an extractable text layer. Scanned PDFs without text fail by default; a fallback mode can emit text-only output. DOCX is not supported in this release.

    Deployment posture: run inside your VPC. Core matching is fully offline. Network use is limited to optional S3 reads and writes you configure. Templates for Docker Compose, ECS Fargate RunTask, and EKS Job ship with the product. Environment variables control input/output sinks, category filters, strategy maps, concurrency, and audit destination.

    Pricing: pay per page processed on demand, or choose a monthly plan with included page volume and lower overage rates. Failed documents (for example, unreadable PDFs) are not billed.

    From BluePeak: we build privacy-forward batch tooling for regulated document workflows. Documentation, configuration reference, and deployment guides live at loopwell.net/docs.

    Highlights

    • Process sensitive documents without sending them to a third-party de-identification API. Matching runs on bundled rules and name dictionaries inside your container - suitable for air-gapped or strict data-residency environments.
    • Choose redact, mask, or pseudonymize independently for each identifier category. Pseudonym mappings are AES-encrypted and written to a path you control, separate from output files and audit logs.

    Details

    Delivery method

    Supported services

    Delivery option
    Batch de-identification job container

    Latest version

    Operating system
    Linux

    Deployed on AWS
    New

    Introducing multi-product solutions

    You can now purchase comprehensive solutions tailored to use cases and industries.

    Multi-product solutions

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    De-ID Studio: Document De-Identification for AI and Analytics Pipelines

     Info
    Pricing is based on the duration and terms of your contract with the vendor. This entitles you to a specified quantity of use for the contract duration. If you choose not to renew or replace your contract before it ends, access to these entitlements will expire.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    1-month contract (2)

     Info
    Dimension
    Description
    Cost/month
    De-ID Studio Standard
    Monthly contract for production document de-identification pipelines. Includes 10,000 pages per month. All seven identifier categories, redact/mask/pseudonymize per category, local and S3 input/output, count-only audit logging, and ECS/EKS deployment templates. Priority email support during business hours. Annual billing available (10% discount).
    $299.00
    De-ID Studio Enterprise
    Monthly contract for high-volume regulated environments. Includes 50,000 pages per month. All Standard capabilities plus dedicated support channel, custom SLA options, security questionnaire support, and multi-region deployment guidance. For volume beyond 50,000 pages per month, contact BluePeak for a private offer.
    $3,500.00

    Vendor refund policy

    BluePeak LLC will review refund requests for first-time De-ID Studio subscriptions made through AWS Marketplace if submitted within fourteen (14) calendar days of the initial charge. Contact contact@loopwell.net  with your AWS account ID, product code, and purchase date. Renewals and private offers are excluded. Refunds are issued per AWS Marketplace refund procedures after approval.

    How can we make this page better?

    Tell us how we can improve this page, or report an issue with this product.
    Tell us how we can improve this page, or report an issue with this product.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    Batch de-identification job container

    Supported services: Learn more 
    • Amazon ECS
    • Amazon EKS
    Container image

    Containers are lightweight, portable execution environments that wrap server application software in a filesystem that includes everything it needs to run. Container applications run on supported container runtimes and orchestration services, such as Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). Both eliminate the need for you to install and operate your own container orchestration software by managing and scheduling containers on a scalable cluster of virtual machines.

    Version release notes

    Patch release with detection fixes and no breaking configuration changes.

    Fixes in 1.0.1

    Street addresses now match common suffixes including Terrace, Circle, Parkway, and Highway (long forms matched before abbreviations) Title-based name detection no longer treats lowercase verbs after "Dr." as surnames (e.g. "Dr. Smith reviewed chart" stays intact)

    Unchanged from 1.0.0

    Batch ingest from local directory or S3; output to local path or S3 Inputs: .txt, .csv, text-layer .pdf Seven identifier categories (names, addresses, phones, emails, government IDs, account numbers, individual-linked dates) Strategies per category: redact, mask, pseudonymize (encrypted mapping file) Exit codes: 0 success, 1 partial failure, 2 fatal config error Offline detection; optional S3 I/O only Deployment templates: docker-compose, ECS Fargate task definition, EKS Job

    Known limitations

    PDF text layer only; scanned PDFs fail unless PDF_FALLBACK=text DOCX not supported Rule/dictionary detection may miss or over-match; validate on your own samples

    Additional details

    Usage instructions

    De-ID Studio is a batch job (not a web service). Subscribe, pull the image, run once, inspect /output files and audit.json. Exit 0 = all files OK; 1 = some files failed; 2 = config error.

    IMAGE AND PATHS

    export AWS_REGION=us-east-1 export IMAGE_URI=709825985650.dkr.ecr.us-east-1.amazonaws.com/blue-peak/de-id:1.0.1 export TEST_DIR=$HOME/deid-studio-test mkdir -p $TEST_DIR/input $TEST_DIR/output

    STEP 1 - LOGIN AND PULL

    aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin 709825985650.dkr.ecr.us-east-1.amazonaws.com docker pull $IMAGE_URI

    STEP 2 - CREATE SAMPLE INPUT FILES

    cat > $TEST_DIR/input/note.txt <<'EOF' Patient: Jane Sample DOB: 03/15/1985 SSN: 123-45-6789 Email: jane.sample@example.com  Phone: (555) 234-5678 Address: 742 Evergreen Terrace, Springfield, IL 62704 Notes: Dr. Smith reviewed chart. Account ending in 4111111111111111 on file. EOF

    cat > $TEST_DIR/input/records.csv <<'EOF' name,email,phone,notes Jane Sample,jane.sample@example.com ,555-234-5678,Patient visit John Example,john.example@example.com ,555-987-6543,Follow-up EOF

    STEP 3 - TEST DEFAULT REDACT (all categories)

    rm -rf $TEST_DIR/output/* docker run --rm
    -v $TEST_DIR/input:/input:ro
    -v $TEST_DIR/output:/output
    -e INPUT_SOURCE=local -e INPUT_PATH=/input
    -e OUTPUT_SINK=local -e OUTPUT_PATH=/output
    -e AUDIT_LOG_PATH=/output/audit.json
    $IMAGE_URI echo "exit code (expect 0): $?"

    VERIFY REDACT: grep '[NAME]' $TEST_DIR/output/note.txt grep '[ADDRESS]' $TEST_DIR/output/note.txt grep 'Dr. Smith reviewed chart' $TEST_DIR/output/note.txt grep '[EMAIL]' $TEST_DIR/output/records.csv cat $TEST_DIR/output/audit.json

    STEP 4 - TEST EMAIL-ONLY FILTER

    rm -rf $TEST_DIR/output/* docker run --rm
    -v $TEST_DIR/input:/input:ro
    -v $TEST_DIR/output:/output
    -e INPUT_SOURCE=local -e INPUT_PATH=/input
    -e OUTPUT_SINK=local -e OUTPUT_PATH=/output
    -e AUDIT_LOG_PATH=/output/audit.json
    -e IDENTIFIER_CATEGORIES=emails
    $IMAGE_URI grep '[EMAIL]' $TEST_DIR/output/note.txt grep 'Jane Sample' $TEST_DIR/output/note.txt

    STEP 5 - TEST MASK STRATEGY

    echo 'SSN: 123-45-6789' > $TEST_DIR/input/mask.txt rm -rf $TEST_DIR/output/* docker run --rm
    -v $TEST_DIR/input:/input:ro
    -v $TEST_DIR/output:/output
    -e INPUT_SOURCE=local -e INPUT_PATH=/input
    -e OUTPUT_SINK=local -e OUTPUT_PATH=/output
    -e AUDIT_LOG_PATH=/output/audit.json
    -e 'STRATEGY_CONFIG={"gov_ids":"mask","emails":"mask"}'
    $IMAGE_URI grep '***-**-6789' $TEST_DIR/output/mask.txt

    STEP 6 - TEST PSEUDONYMIZE

    echo 'Patient Jane Sample' > $TEST_DIR/input/pseudo.txt export MAPPING_KEY=$(openssl rand -hex 32) rm -rf $TEST_DIR/output/* docker run --rm
    -v $TEST_DIR/input:/input:ro
    -v $TEST_DIR/output:/output
    -e INPUT_SOURCE=local -e INPUT_PATH=/input
    -e OUTPUT_SINK=local -e OUTPUT_PATH=/output
    -e AUDIT_LOG_PATH=/output/audit.json
    -e 'STRATEGY_CONFIG={"names":"pseudonymize"}'
    -e MAPPING_OUTPUT_PATH=/output/mapping.bin
    -e MAPPING_ENCRYPTION_KEY=$MAPPING_KEY
    $IMAGE_URI ls -l $TEST_DIR/output/mapping.bin cat $TEST_DIR/output/pseudo.txt

    STEP 7 - TEST CONFIG ERROR (expect exit 2)

    docker run --rm $IMAGE_URI; echo "exit code (expect 2): $?"

    STEP 8 - ECS / EKS

    Use fulfillment templates deploy/templates/ecs-run-task.json and deploy/templates/eks-job.yaml. Set image to $IMAGE_URI, configure S3 buckets, task role s3:GetObject on input prefix and s3:PutObject on output prefix, then RunTask or apply Job.

    DOCS: https://loopwell.net/docs/configuration  SUPPORT: contact@loopwell.net 

    Support

    Vendor support

    BluePeak LLC supports De-ID Studio subscribers by email at contact@loopwell.net .

    Documentation: https://loopwell.net/docs/quickstart  (quickstart), https://loopwell.net/docs/deployment  (ECS/EKS/local), https://loopwell.net/docs/configuration  (environment variables), https://loopwell.net/docs/compliance  (customer responsibilities and limitations).

    Billing: subscriptions and usage charges flow through AWS Marketplace only. BluePeak does not sell this product via direct invoice, wire transfer, or card checkout outside Marketplace.

    We help with: first-run configuration, S3 IAM policies for input/output buckets, strategy JSON setup, interpreting audit output, and upgrading container tags between versions.

    Response expectation: business-day email reply for Standard and Enterprise plans; best-effort within three business days for pay-as-you-go.

    Out of scope: legal opinions on regulatory status, custom OCR for scanned PDFs, on-site professional services (available under separate engagement), and guarantees that output meets your specific compliance framework.

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    0 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    0%
    0%
    0%
    0%
    0 reviews
    No customer reviews yet
    Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.