Leksara

Leksara - 1
Click to expand
Leksara - 2
Click to expand
Leksara - 3
Click to expand
Category:E-Commerce, Data Science, Open-source Tool
Client:Leksara (Open-source Project)
Duration:August 2025 - October 2025
Year:2025

My Approach:
Crafting Digital
Excellence

Leksara is an open-source Python toolkit for processing Indonesian text in the e-commerce domain, automating tasks like text cleaning, stopword removal, slang normalization, and punctuation handling. It helps Data Scientists and ML Engineers streamline their workflows and ensure clean, usable data for analysis.

Key features of Leksara include:

  • CartBoard Review Intake: Enables dataset audits with PII flags, rating detection, and noise diagnostics, ensuring that sensitive data is handled securely.
  • PII Masking & Redaction: Automatically replaces sensitive information like phone numbers, emails, and national IDs with configurable modes, using regex patterns.
  • Review-Focused Normalization: Expands slang, repairs contractions, trims elongated words, and extracts ratings from Indonesian text, making it ideal for e-commerce reviews.
  • ReviewChain Orchestrator: Provides customizable preset pipelines for data cleaning, benchmarking, and hybrid custom workflows.
  • Resource-driven Customization: Allows users to integrate their own dictionaries and regex rules to adapt the tool to new industries or domains.

Launched in August 2025 and completed by October 2025, Leksara aims to optimize text preprocessing for e-commerce applications, making it faster and more efficient. With its ability to handle sensitive information and normalize unstructured reviews, Leksara ensures that the data is both clean and ready for machine learning applications.

By using Leksara, users gain flexibility in adapting the tool to their specific needs, ensuring that their data is processed quickly and accurately without the need for rebuilding preprocessing pipelines from scratch. Whether you're working with product reviews, user feedback, or any other form of text data in Bahasa Indonesia, Leksara is the ideal toolkit to handle it.

Other Projects