date: 2024-04-23
title: System-Design-Specification
status: DONE
author:
- AllenYGY
tags:
- Document
- SDS
- DPW
version: I
created: 2024-04-23T19:35
updated: 2024-06-11T01:14
publish: True
System-Design-Specification
Change Date | Changed By | Version | Change Description |
---|---|---|---|
04/23/2024 | Junya Yang | 1.0 | Prepared Document |
Design Overview for the Data Analysis Platform
Our platform is meticulously architected to provide an end-to-end data analysis solution, encompassing data preprocessing, modeling, and visualization functionalities. To ensure a robust and responsive user experience, our system architecture distinctly separates the frontend and backend services.
Backend Architecture: We leverage the power of Python's FastAPI framework, renowned for its high performance and ease of use, to handle data-intensive backend operations efficiently. This choice enables us to implement asynchronous processing, significantly boosting the responsiveness of our platform.
Frontend Design: The user interface is built using Vue.js, a progressive JavaScript framework known for its adaptability and component-based architecture. This setup allows for a dynamic and seamless interaction with the data analysis functionalities, providing users with an intuitive experience as they upload and manipulate their data.
Core Functionalities:
Our platform is designed to be flexible, catering to a wide range of data analysis needs and making sophisticated data science accessible to users with varying levels of expertise.
Here, we show some models instruction
Linear regression is a basic statistical model, which is used to explore the linear relationship between variables. It assumes that there is a straight line relationship between the dependent variable and the independent variable, and predicts by fitting the best straight line. The fitting of the model is completed by the least square method, and the indexes such as the square of R, the square of adjusted R and the standard error are usually used in the evaluation. In application, we need to pay attention to model assumptions, such as linear relationship, multicollinearity and normal distribution of error terms.
Decision tree is a non-parametric supervised learning algorithm, which is utilized for both classification and regression tasks. It has a hierarchical, tree structure, which consists of a root node, branches, internal nodes and leaf nodes.
A decision tree starts with a root node, which does not have any incoming branches. The outgoing branches from the root node then feed into the internal nodes, also known as decision nodes. Based on the available features, both node types conduct evaluations to form homogenous subsets, which are denoted by leaf nodes, or terminal nodes. The leaf nodes represent all the possible outcomes within the dataset.
K-nearest neighbor (KNN) is an instance-based supervised learning algorithm, and it is sensitive to data size and dimension. It performs classification or regression by measuring the distance between a new sample and a known sample in the training set. In classification, it assigns the new sample to the category to which the K closest neighbors belong. In regression, it predicts the value of the new sample, estimated by the average of the K closest neighbors. The working principle of the KNN algorithm consists of the following steps: Calculate distance, Select Nearest neighbor, Voting or calculating the mean.
我们将会搭建一个通用的数据分析平台,平台将会实现包括数据预处理,建模,可视化等功能。
我们将使用前后端分离的架构,后端将使用由python搭建的FASTAPI框架
前端将使用Vue.js
用户将通过我们搭建的前端界面,上传数据;选择希望执行的操作,完成一整套数据分析的流程
们也提供常用的模型包括回归,聚类
In python we use the Numpy
, pandas
, sklearn
, matplotlib
libraries
Common to all platforms, including MacOS, Windows, Linux
Accessible via browser on all platforms