pollandroll/daru_lite

View on GitHub
README.md

Summary

Maintainability
Test Coverage
# daru Lite - Data Analysis in RUby Lite

Simple, straightforward DataFrames for Ruby

[![Build Status](https://github.com/pollandroll/daru_lite/actions/workflows/build.yml/badge.svg)](https://github.com/pollandroll/daru_lite/actions)
[![Gem Version](https://img.shields.io/gem/v/daru_lite.svg)](https://rubygems.org/gems/daru_lite)
[![Maintainability](https://api.codeclimate.com/v1/badges/f87d4ed10b5731e50184/maintainability)](https://codeclimate.com/github/pollandroll/daru_lite/maintainability)
[![Test Coverage](https://api.codeclimate.com/v1/badges/f87d4ed10b5731e50184/test_coverage)](https://codeclimate.com/github/pollandroll/daru_lite/test_coverage)

## Introduction

daru Lite is a library for data analysis and manipulation in Ruby.

This project started as fork of [Daru](https://github.com/SciRuby/daru) with the objective to provide :
- a simple and yet powerfull interface to manipulate data using DataFrames
- a API consistent with the one historically provided by daru
- a focus on the core features around data manipulation, droped several cumbersome daru dependencies and the associated features : notably N-Matrix, GSL, R, imagemagick and all plotting libraries. The current project has no major dependencies
- build a future-proof library that can safely be used in production

## Installation

```console
$ gem install daru_lite
```

or add daru Lite to your Gemfile:
```console
$ bundle add daru_lite
```

## Basic Usage

daru Lite exposes two major data structures: `DataFrame` and `Vector`. The Vector is a basic 1-D structure corresponding to a labelled Array, while the `DataFrame` - daru's primary data structure - is 2-D spreadsheet-like structure for manipulating and storing data sets.

Basic DataFrame intitialization.

``` ruby
data_frame = DaruLite::DataFrame.new(
  {
    'Beer' => ['Kingfisher', 'Snow', 'Bud Light', 'Tiger Beer', 'Budweiser'],
    'Gallons sold' => [500, 400, 450, 200, 250]
  },
  index: ['India', 'China', 'USA', 'Malaysia', 'Canada']
)
data_frame
```
![init0](images/init0.png)


Load data from CSV files.
``` ruby
df = DaruLite::DataFrame.from_csv('TradeoffData.csv')
```
![init1](images/init1.png)

*Basic Data Manipulation*

Selecting rows.
``` ruby
data_frame.row['USA']
```
![man0](images/man0.png)

Selecting columns.
``` ruby
data_frame['Beer']
```
![man1](images/man1.png)

A range of rows.
``` ruby
data_frame.row['India'..'USA']
```
![man2](images/man2.png)

The first 2 rows.
``` ruby
data_frame.first(2)
```
![man3](images/man3.png)

The last 2 rows.
``` ruby
data_frame.last(2)
```
![man4](images/man4.png)

Adding a new column.
``` ruby
data_frame['Gallons produced'] = [550, 500, 600, 210, 240]
```
![man5](images/man5.png)

Creating a new column based on data in other columns.
``` ruby
data_frame['Demand supply gap'] = data_frame['Gallons produced'] - data_frame['Gallons sold']
```
![man6](images/man6.png)

*Condition based selection*

Selecting countries based on the number of gallons sold in each. We use a syntax similar to that defined by [Arel](https://github.com/rails/arel), i.e. by using the `where` clause.
``` ruby
data_frame.where(data_frame['Gallons sold'].lt(300))
```
![con0](images/con0.png)

You can pass a combination of boolean operations into the `#where` method and it should work fine:
``` ruby
data_frame.where(
  data_frame['Beer']
  .in(['Snow', 'Kingfisher','Tiger Beer'])
  .and(
    data_frame['Gallons produced'].gt(520).or(data_frame['Gallons produced'].lt(250))
  )
)
```
![con1](images/con1.png)

## Documentation

Docs can be found [here](http://www.rubydoc.info/gems/daru_lite).