Lambda-School-Labs/Labs26-StorySquad-DS-TeamB

View on GitHub
notebooks/README.md

Summary

Maintainability
Test Coverage
#### Overview of `notebooks` content:
- `clustering`: This notebook explores three different clustering methods to create groupings of users for the gamification portion of Story Squad. The currently implemented version ([clustering_mvp.py](../project/app/utils/clustering/clustering_mvp.py)) creates groups based on the ranking of the squad scores. The other methods explored were `KMeans Clustering` and `Nearest Neighbors`. These have not been implemented in our application due to time constraints.
- `count_spelling_errors`: This notebook contains exploration of various spell check libraries to explore whether spell check could correct transcription errors, act as a metric for student writing, and/or increase the reliability of other metrics. For the time being, we did not see enough improvement and consistency to implement this feature.
- `score_visual`: This notebook explores different visualizations to display on the parent's dashboard. Several versions were mocked up and presented to the stakeholder. [`histogram.py`](../project/app/utils/visualizations/histogram.py) and [`line_graph.py`](../project/app/utils/visualizations/line_graph.py) are the resulting final visuals per the feedback provided by the stakeholders. Each of these `.py` files are implemented in our application at the visualization endpoint.
- `squad_score_mvp`: This notebook contains data exploration of training data, generation of MinMaxScaler, and Squad Score formula composition for complexity metric. Also produces [`squad_score_metrics.csv`](../data/squad_score_metrics.csv) which contains a row for each training data transcription. Features include `story_id`, all features used in the most recent Squad Score formula, and [`squad_score.py`](../project/app/utils/complexity/squad_score.py).
- `submission_endpoint_interactions`: This Notebook demonstrates the functionality for [`submission.py`](../project/app/api/submission.py) endpoints and outlines the file structure that is required from the endpoints `UploadFile` type.
- `transcribed_stories`: This notebook connects to the Google Cloud Vision API and transcribes the given 167 stories. Produces the [`transcribed_stories.csv`](../data) which includes the Submission ID and the Transcribed Text. The `transcribe` method is used to create [`transcription.py`](../project/app/utils/img_processing/transcription.py) which is used in the application. 
- `transcription_confidence`: This notebook explores Google Cloud Vision API's method to return confidence levels of its transcription. Produces the [`error_confidence_metrics.csv`](../data) which includes story_id, error (calculated between the api transcription and provided human transcription) and confidence for each submission. The `image_confidence` method is modified to create the [`confidence_flag.py`](../project/app/utils/img_processing/confidence_flag.py) which is used in the application.