System-Design-Specification

Document Change Log

Change Date Changed By Version Change Description
04/23/2024 Junya Yang 1.0 Prepared Document

Table of Contents

  1. DOCUMENT CHANGE LOG
  2. TABLE OF CONTENTS
  3. DESIGN OVERVIEW
  4. TOOLS AND STANDARDS
    1. Tools
    2. Standards
  5. USER INTERFACE DESIGN
    1. Usage Scenario I
    2. Usage Scenario II
  6. DIAGRAMS

DESIGN OVERVIEW

Design Overview for the Data Analysis Platform

Our platform is meticulously architected to provide an end-to-end data analysis solution, encompassing data preprocessing, modeling, and visualization functionalities. To ensure a robust and responsive user experience, our system architecture distinctly separates the frontend and backend services.

Backend Architecture: We leverage the power of Python's FastAPI framework, renowned for its high performance and ease of use, to handle data-intensive backend operations efficiently. This choice enables us to implement asynchronous processing, significantly boosting the responsiveness of our platform.

Frontend Design: The user interface is built using Vue.js, a progressive JavaScript framework known for its adaptability and component-based architecture. This setup allows for a dynamic and seamless interaction with the data analysis functionalities, providing users with an intuitive experience as they upload and manipulate their data.

Core Functionalities:

  • Data Upload: Users can easily upload datasets via the frontend interface.
  • Data Preprocessing: The platform offers a suite of tools for cleaning and preparing data for analysis, ensuring data quality and readiness.
  • Modeling Capabilities: Users have access to a variety of built-in models, including regression and clustering, to uncover patterns and derive insights from their data.
  • Visualization Tools: Integrated visualization tools enable users to create engaging and informative visual representations of their analysis results.

Our platform is designed to be flexible, catering to a wide range of data analysis needs and making sophisticated data science accessible to users with varying levels of expertise.

Here, we show some models instruction

Linear regression is a basic statistical model, which is used to explore the linear relationship between variables. It assumes that there is a straight line relationship between the dependent variable and the independent variable, and predicts by fitting the best straight line. The fitting of the model is completed by the least square method, and the indexes such as the square of R, the square of adjusted R and the standard error are usually used in the evaluation. In application, we need to pay attention to model assumptions, such as linear relationship, multicollinearity and normal distribution of error terms.

Decision tree is a non-parametric supervised learning algorithm, which is utilized for both classification and regression tasks. It has a hierarchical, tree structure, which consists of a root node, branches, internal nodes and leaf nodes.
A decision tree starts with a root node, which does not have any incoming branches. The outgoing branches from the root node then feed into the internal nodes, also known as decision nodes. Based on the available features, both node types conduct evaluations to form homogenous subsets, which are denoted by leaf nodes, or terminal nodes. The leaf nodes represent all the possible outcomes within the dataset.

K-nearest neighbor (KNN) is an instance-based supervised learning algorithm, and it is sensitive to data size and dimension. It performs classification or regression by measuring the distance between a new sample and a known sample in the training set. In classification, it assigns the new sample to the category to which the K closest neighbors belong. In regression, it predicts the value of the new sample, estimated by the average of the K closest neighbors. The working principle of the KNN algorithm consists of the following steps: Calculate distance, Select Nearest neighbor, Voting or calculating the mean.

Tools and Standards

Tools

In python we use the Numpy, pandas, sklearn, matplotlib libraries

Standards

Common to all platforms, including MacOS, Windows, Linux
Accessible via browser on all platforms

User Interface Design

User Interface

Diagrams