sweatshoptech/idb

View on GitHub
templates/report/design.html

Summary

Maintainability
Test Coverage
  <!DOCTYPE html>
  <html lang="en">

  <head>
    <meta charset="UTF-8">
    <title>SWEatshop.tech</title>
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet"/>
    <link href="https://fonts.googleapis.com/css?family=Roboto:100,300,400" rel="stylesheet">
    <link rel="stylesheet" href="/static/css/styles.processed.css"/>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.1.1/jquery.min.js"></script>
    <script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-alpha.6/js/bootstrap.min.js" integrity="sha384-vBWWzlZJ8ea9aCX4pEW3rVHjgjt7zpkNpZk+02D9phzyeVkE+jo0ieGizqPLForn" crossorigin="anonymous"></script>
    <script src="https://unpkg.com/isotope-layout@3/dist/isotope.pkgd.js"></script>
  </head>
    <body class="gradient-bgr">
      <nav class="navbar navbar-toggleable-md navbar-inverse bg-inverse-opacity">
      <button class="navbar-toggler navbar-toggler-right" type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
        <span class="navbar-toggler-icon"></span>
      </button>
      <a class="navbar-brand" href="/index.html"><img alt="Brand" src="/static/img/sweatshop-logo.png" style="width: 200px;"></a>
      <div class="collapse navbar-collapse" id="navbarSupportedContent">
        <ul class="navbar-nav ml-auto">
            <li class="nav-item">
              <a class="nav-link" href="/index.html">Home</a>
            </li>
            <li class="nav-item">
              <a class="nav-link" href="/about.html">About</a>
            </li>
            <li class="nav-item">
              <a class="nav-link" href="/companies.html">Companies</a>
            </li>
            <li class="nav-item">
              <a class="nav-link" href="/people.html">People</a>
            </li>
            <li class="nav-item">
              <a class="nav-link" href="/investors.html">Investors</a>
            </li>
            <li class="nav-item">
              <a class="nav-link" href="/schools.html">Schools</a>
            </li>
            <li class="nav-item">
              <a class="nav-link" href="/visualization.html">Visualization</a>
            </li>
          </ul>
      </div>
    </nav>

    <div class="inner container" style="color: white; font-family: Open Sans;">
               <br/>
          <div class="card text-center" style="background-color: #333; width:500px; max-width: 95%;" >
            <div class="card-block">
              <h4 class="card-title">Report Menu</h4>
                <p class="card-text"><br/>
                  <a href="../report" style="color:white;">Report</a><br/>
                  <a href="introduction.html" style="color:white;">Introduction</a><br/>
                  <a href="design.html" style="color:white;">Design</a><br/>
                  <a href="tools.html" style="color:white;">Tools</a><br/>
                  <a href="hosting.html" style="color:white;">Hosting</a><br/>
               </p>
             </div>
           </div>
        <br/><br/>

        <h1><b>Design</b></h1>
        <br/>
        <h2>UML Model</h2>
        <img src="https://raw.githubusercontent.com/sweatshoptech/idb/master/static/img/IDB1.png" alt="">
        <br/>
              Above is our UML diagram created using this text on <a href="https://yUML.me">yUML</a>:
        <br/>
        <div class="card" style="background-color: #333;" >
          <div class="card-block text-left" style="text: left">
        <pre><code style="color: white;">[Person|name: string; title: string; location: string; dob: DateTime; image_url: string; website: string;]
        [Company|name: string; location: string; ownership_type: enum; funding: string; description: string; image_url: string; size: int; website: string;]
        [Person]++&lt;1..*-1..*&gt;[Company]
        [School|name: string; location: string; description: string; size: int; image_url: string;  website: string;]
        [School]++&lt;1..*-1..*&gt;[Person]
        [School]++&lt;1..*-1..*&gt;[Investor]
        [Investor|name: string; location: string; funding: int; description: string; image_url: string; website: string;]
        [Investor]++&lt;1..*-1..*&gt;[Company]
        [Category|idnum:int; name: string;]
        [Category]++&lt;1-1..*&gt;[Investor]
        [Category]++&lt;1-1..*&gt;[Company]</code></pre>
          </div>
        </div>
        
        We have four models: Company, Person, Investor, and School. Each model connects to two other models. Each company has a CEO, employees, and investors, people work for companies and have attended schools, investors invest in companies and schools, and schools have investors and alumni. Apart from this, each model also has additional information about each instance, such as location, description, website, etc. When creating the models, we set one-to-many and many-to-many relationships by creating tables in our <a href="http://models.py">models.py</a> file.
        <br/><br/>
        <h2>API</h2>
        <a href="https://sweatshop.tech/api">SWEatshop API</a><br>
        SWEatshop has a simple API to retrieve centralized data regarding companies along with the people, investors and schools of everyone involved. We created a RESTful API in order to satisfy get requests by returning rows from our tables in our Amazon Web Services Relational PostgreSQL Database using Flask-SQLAlchemy. In future versions, we expect to make the API accept POST and UPDATE calls as well so that the AWS database can be updated through API calls. We get most of our model data from the <a href="http://data.crunchbase.com">Crunchbase API</a>. The API has information on the companies, people, investments, and a much more. <a href="http://ip-api.com/docs/">IP API</a> allows us to fill in location information for some of the blank fields in our database by providing a location from the model's website url. We utilize <a href="https://developers.google.com/maps/">Google Maps API</a> to provide a map to the location of each instance in the instance pages. <a href="https://clearbit.com/docs">Clearbit API</a> lets us scrape company websites for logos. Finally, <a href="https://www.microsoft.com/cognitive-services/en-us/bing-image-search-api">Bing Images API</a> was used to find photos for our person models.<br />
        <br/><br/>
        <h2>Sorting, Filtering, and Pagination</h2>
        Phase 1: We use JQuery, Javascript, Isotope, Jinja2, Flask, HTML, and Bootstrap to implement sorting and filtering. First we create a template using HTML and Bootstrap. Each item is represented using a Bootstrap card and the data for each card is dynamically populated by Flask and Jinja2. Flask accesses the AWS Relational Database through Flask PostgreSQL SQLAlchemy and renders the retrieved information in the template by means of Jinja2. We then set classes and certain other attributes in the HTML through Flask and Jinja2 so that the filters will know what to do. The filtering and sorting functionality are implemented in the front end through the use of Isotope, JQuery, and embedded Javascript. This allows us to, upon detecting a button click, change the content of the Isotope grid in the HTML template body. In order for Isotope to recognize that an item falls under a particular filter, a class attribute must be set with the name of the filter, and the same name must be set to the respective button’s list of attributes. For the sorting function to know what data to sort by, an attribute must be created in the div surrounding the Bootstrap card for each item. This means that implementation is very simple. All of the instances are rendered on load and sorting/filtering is just a matter of moving the cards around. The Isotope library does this well and simply by importing the library, all the functionality that we need is implemented beautifully. <br/><br/>
        
        Reflection on Phase 1 and Planning for Phase 2: There are many downsides to the front-end sorting and filtering approach, though. Initially loading the page can take a very long time. This is because the page needs to find the data from our Amazon Web Services database and then render the images for all of the cards. This is a lot of data to load and therefore. Front-end sorting and filtering was easy and fast for Phase 1 because we only had three instances to query, but it became an unacceptable approach once we started adding 1000+ rows to our database tables. This, of course, is just issues with loading time. There are even more problems with front-end filtering and sorting once Phase 2 functionality is added. Notably, Isotope only can filter and sort the cards within the page. In order to speed up loading times and clean up the models pages, we had to implement pagination. Isotope will not work with pagination because isotope will only be able to filter and sort the items within that particular page. All of this means that once Phase 2 starts, filtering and sorting will have to move to the backend.<br/><br/>
        
        Phase 2: The first step was to implement pagination. We started off by trying front-end solutions for ease of implementation and for compatibility with Isotope but the methods that we did find did not work properly for our use case. So, we moved pagination to the back-end. Using Flask-Paginate, we limited the number of elements passed into the HTML template and put in the pagination buttons. Upon click of a page number, we calculate the offset and elements to load and query the database with these parameters. Passing just these to the template means that now, we only need to the render the data and images for a small number of elements. This means that page load is significantly faster. However, it also means that we now need to move the filtering and sorting. Upon button click, we now add the filters and sort details to the URL query and set the window href. The Flask backend extracts these values, queries the database with the approrpriate filters and returns the new set of rows. The template page reloads with the new data. This process is largely simple but one major issue we ran into is saving the button states. Previously, this was not an issue since there was a default on page load and it would change upon button click but now, we need to set the button state. We finally accomplished this task by using jQuery and finding the appropriate values from the href used to access the page.
              <br /><br /><br />
        <h2>DB</h2>
        We obtained most of our data by scraping the <a href="https://data.crunchbase.com/">Crunchbase RESTful API.</a> Unfortunately we ran into a couple of issues. First, the Crunchbase API would only give us access to a couple hundred instances for free. As we are college students, we can not afford the extra price for the entire API. However, Crunchbase does provide a snapshot of the a hundreds of thousands of models for Companies, Investors, People,and Schools. We combined the data obtained from RESTful API and the snapshot to create a more complete database. <br/><br/>
        
              <b>Schools</b><br/>
              The dump provided by Crunchbase did not consider schools objects and therefore, did not have a lot of information about schools. We could only get the names of the schools that people went to. As a first step, we created a school row and added a relation for every person, checking whether the school name has already been added to the database. Next, we used the IPEDS (Integrated Postsecondary Education Data System) API provided by <a href="http://data.gov">data.gov</a> to get additional information about the schools. This, since it is provided by the U.S. government, only has a list of U.S. schools. Still, it was a step towards where we needed to be and so, we used this API to fill in more information. We then thought that we could do a lot more if only we had the website URLs of all of these schools. So, next, we wrote a script to look up all of these school names and find possible links for them. We cleaned up this data and then used these URLs to find locations for these schools. Since most schools host their websites on campus servers, we used <a href="http://ip-api.com/">IP-API</a> to find the IP addresses and from there, the locations of all of these schools. We also went through the meta tags of these school websites to find descriptions but not all schools listed descriptions in their meta tags. In this manner, we filled the schools table with as much information as possible. There are faults with this, though. Especially with the increased use of services such as Cloudflare and AWS, many schools, especially small schools, are hosting their websites elsewhere. Finding investors for these schools was just as hard but by looking specifically for investors that fund schools, we added relationships between investors and schools. Furthermore, since the schools were just added using names, there are many issues with duplicate rows. Notably, certain person rows had a type or ann extra symbol in the name of their school. When adding to our database, this would create multiple rows for the same school. This is certainly a problem but it is difficult to clean up without manually going through all of the rows in the database.<br/><br/>
              <b>Other Issues</b><br/>
              The Crunchbase data was far from complete. Many rows still did not have an image URL and/or a website. We got the websites again using the same technique used for schools and then we had to utilize the <a href="https://clearbit.com/logo">ClearBit API</a> in order to obtain the logos for the schools. Clearbit takes in a URL, goes through the website, and finds a logo if there is one. It works most of the time and the API is extremely simple to use without any rate limits. Finally, some companies, investors, and people did not have a location. So, again, we utilized the <a href="http://ip-api.com/">IP Geolocation API</a> to get location information from the website URL that the models provided in order to create a more complete database. With a lot of these scrapers, there is no guarantee that we get the right information. Our data is far from perfect but it was what we could do with the data that is freely available in the time that we had.
        <br/><br/><b>Create a New Person (HTTP POST request)</b><br />
        You can create a new person by utilizing this action. You can provide a JSON object with the first name, last name, title, city name or region name, profile image url, and homepage url.<br /><br />
        <b>Create a New Organization (HTTP POST request)</b><br />
        You can create a new company, school, or investor by adding a new organization. <br />
        
        In order to create a new company you can provide a JSON object with the name, city name or region name, a short description, a profile image url, and a homepage url. The primary role must be that of a company. <br />
        
        In order to create a new school you can provide a JSON object with the name, city name or region name, short description, profile image url, and homepage url. The primary role must be of a school. <br />
        
        Finally in order to create a new investor you can provide a JSON object with the name, city name or region name, a short description, a profile image url, and a homepage url. The primary role must be of an investor..<br /><br /><br />
        <h2>Website Redesign</h2>
        After the first phase, our website had a purple background, default text, and a messy about page. We decided to do a complete redesign of our website and make it look more professional and polished. To begin with, the color scheme changed from purple to black, gray, and white. A new, bolder logo was created for our organization. We also decluttered our about page by moving all of our statistics and important links to the top so that they wouldn’t get lost. The bios were shortened and the tools were organized in a much more efficient manner.<br /><br /><br />
        
        <h2>Searching</h2>
        The first step in implementing search is to find a library that integrates well with Flask-SQLAlchemy. While we can manually search the fields directly, using a library simplifies many of the basic tasks and it is typically also optimized for the purpose. The first such library that we found was Flask-WhooshAlchemy, one that integrates really well with Flask-SQLAlchemy. With this, we could specify exactly which attributes we would like to search and it had similar syntax to Flask-SQLAlchemy. One issue with this library, though, is that it is not very actively maintained and it only indexes rows as they are added or deleted. It does not automatically index anything already in the database. Primarily for this reason, we switched to using Flask-WhooshAlchemyPlus, a fork of the original library but one that has a function to index all rows already in the database. It does not, however, support terms, something that Whoosh on its own does.
        
        The library was very useful in searching all the fields in each data model that we had. Now, we needed a way for the user to see results by model. Based on the use cases for our models, we wanted to ensure that the client knows exactly which data model he or she is searching. So, we separated our search results for each data model by adding the result type to a set of tabs on the top of the page.
        
        When a user searches for a query from the search bar, the search by default looks for the query in the Person data model. If the client would like to search any of the other data models, they could simply click on the data model they want on the tab navigation bar and the tool will instantly return search results for the same query on the data model specified. While the tab-based results gave the page a good look, this brought up several new complications. Within a single page, we would now have 4 pagination options. Our pagination works by adding pagination to the URL of the page. That number is extracted out of the href by Flask and it is passed on to the database so that the right query results are returned. However, if there are four sets of results, each having separate pagination poses a problem. By default, by our design, the pagination would not change back to page one when we switched between each data model. It would stick to whichever page you are on. For example, if you are on page 20 of the People results, if you switch to the Investors tab, you will remain on Page 20. This is not necessarily bad on its own but what happens if there are less than 20 pages for investors? Well, then, you get a blank tab and that should not happen. Therefore, we decided to refactor the code and have unique pagination for each page. For this to work properly without messy code, we had to have a different URL for the results of each data model. This way, when a user switches to a different model, they would be taken back to page 1 and each model would work independently.
        
        Next, we implemented the AND/OR search to allow the client to do multiple word searches. This connects the words in the query together such that if the client chooses the AND search the results will only be of those that have all the words in the search query. If the client chooses the OR search it will return any result that has even one word of the search query. The client can choose the type of the search from the navigation bar as well in each data model. This can be a very useful feature as often, while you may not find an exact match, your search may yield matches that are similar to at least part of your query. Alternatively, you might be interested in objects that have connections to the United States and India or interested in objects that have connections to either the United States or India. To support both of these cases, the AND/OR searching is a great feature. This feature is also one that is easy to implement as Flask-WhooshAlchemy supports searching the tables with an OR constraint.
        
        Another important feature in our search tool is that we have a utility processor that highlights the query that the client requested. An issue we had here was that the search results could very long descriptions. We only wanted description snippets so that the client will only see the closest 175 characters near their search query but at the same time, we wanted to contextualize the results. For this we added another method in our utility processor such that it outputs a modified version from the description in the database. It finds the location of the first query match and snippets the content surrounding it. All the other attributes still used the basic highlighting method in our utility processor but description of all models utilized our new contextualizing method.
        
        <br/><br/><br/>
        <h2>Visualization</h2>
        We were tasked with visualizing some of the data from the <a href="foodcloseto.me">Food Close to Me </a>group by utilizing their RESTful API. In order to do this, we used the D3.js library. Originally we chose to visualize how many restaurants there are for each type of food. We decided to show this by utilizing a bubble chart that was interactive and moved around when you clicked on a specific catgory. However, we discovered that we could only show 10 categories, otherwise it will take a very long time to load the page. The moving of the bubbles has an exponential time complexity not allowing us to show all the categories. So we are only showing the most popular types of foods.<br />
        We chose to add another visualization in order to show more data from <a href="foodcloseto.me">Food Close to Me</a>. We show what the average rating of each zipcode is by portraying it in a barchart that you can sort by ratings (Higher rating - Lower rating) or by zipcodes.
        
        <br/><br/><br/>
        <h2>PlanITPoker</h2>
         For the third phase of the project, we were asked to use PlanITpoker and track our work in the form of user stories.
         Setting up the user stories wasn't hard since we knew exactly what we needed to do in the project requirements. In most industry projects, it is important to ensure that a story is something that users actually need. However for this project, we did not have to conduct complex and detailed analysis to figure out the needs of the user before writing our user stories. Estimating how much time a story would take was tricky - everyone had a different difficulty in mind. We realized that our estimates got better once we had worked on the first few stories. Earlier, we were using tracking our to-do items through Github issues much like waterfall methodology, along with Trello which uses agile methodology. Implementing user stories was refreshing since it saved us time as some members of our group worked remotely, among other advantages. First, user stories act as a guide to the end goal of the phase. Second, it was easier for us to understand the problem with clear and concise (less than three line) descriptions.
         Creating user stories is easy - we found that our team goals were communicated more thoroughly but not a lot of time was invested in writing them. Our group also welcomed the freedom for creativity since user stories only provide a basic outline of the needed functionality, and the developer in charge has the flexibility to run with it the way they want.
         One limitation we noticed was scalability - it may be difficult to maintain and update user stories in a large project. Overall, we were very pleased with how structured our planning and estimates became due to user stories. <br>
     </div>
      </body>
      <div class="outro">
      © SWEatshop 2017
    </div>
  </html>