ipfs-search/ipfs-search

View on GitHub
docs/towards_dist_search/Current_implementations.md

Summary

Maintainability
Test Coverage

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

-   Full distribution of every task (no centralized coordination at all): Peers can perform their jobs independently and communicate required data. This will prevent link congestion to a central server. Functionally identically programmed nodes, distinguished by a unique identifier only.    

Unexpected trailing spaces found.
Open

-   Platform independence of nodes.    

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

[Apoidea: A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web](https://www.cc.gatech.edu/~lingliu/papers/2003/apoidea-sigir03.pdf "https://www.cc.gatech.edu/~lingliu/papers/2003/apoidea-sigir03.pdf"), Aameek Singh, Mudhakar Srivatsa, LingLiu, and Todd Miller, 2003    

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

-   _Content addressing_: All content is uniquely identified by its multihash checksum.    

Unexpected trailing spaces found.
Open

-   Performance: Nodes run on spare CPU processing power and are not to put too large a load on a client machine.    

Unexpected trailing spaces found.
Open

-   No single point of failure and gracefully dealing with permanent and transient failures (identify crawl traps and be tolerant to external failures)    

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

-   The **_Extractor_** extracts linked hashes/links from the object and passes them on to the hash validator. It can perhaps be multithreaded and process different pages simultaneously.    

Unexpected trailing spaces found.
Open

-   Version history of each file is tracked by IPFS. 

Unexpected trailing spaces found.
Open

-   Locally computable content address assignment based on consistent hashing - IPNS is consistent (check for the presence of DNSLink first for it is faster)    

Unexpected trailing spaces found.
Open

    -   Nodes do not crash on the failure of a single peer. ⇒ dynamic reallocation of addresses can be done across other peers.        

Unexpected trailing spaces found.
Open

-   Each crawl job queue has one **_Content Fetcher_** thread associated with it that streams data into the extractor.    

Unexpected trailing spaces found.
Open

    -   If we keep the connection open to a node to minimise connection establishment overhead we will need exception handling.        

Unexpected trailing spaces found.
Open

-   The **_Hash Validator_** checks whether a hash is the responsibility of the node. If so, it sends the hash to the preprocessor. If not, it is sent out on the network.    

Unexpected trailing spaces found.
Open

    -   When a node comes back online after having failed, data that needs to be stored on that node is propagated to it and can be retrieved from it. 

Unexpected trailing spaces found.
Open

-   A _block_ is a variable-size block of data.    

Unexpected trailing spaces found.
Open

-   A _list_ is an ordered collection of blocks or other lists.    

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

    -   If a node failure is permanent, the node needs to be recoverable using data stored by other nodes.        

Unexpected trailing spaces found.
Open

[UbiCrawler: A Scalable Fully Distributed Web Crawler](http://vigna.di.unimi.it/ftp/papers/UbiCrawler.pdf "http://vigna.di.unimi.it/ftp/papers/UbiCrawler.pdf"), Paolo Boldi, Bruno Codenotti, Massimo Santini, Sebastiano Vigna    

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

-   _Tamper resistance_: All content is verified with its checksum.    

Unexpected trailing spaces found.
Open

    -   Data sent to a node while it is down still needs to propagate properly throughout the system.        

Unexpected trailing spaces found.
Open

    -   Data is still retrievable even with a node failing.        

Unexpected trailing spaces found.
Open

-   Portability: The nodes can be configured to run on any kind of dweb network by just replacing it.    

Unexpected trailing spaces found.
Open

    

Unexpected additional newlines at the end of the file.
Open

### [Source](https://niverel.tymyrddin.space/en/play/stones/current/peer-crawling)

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

-   If a node is storing a node that is the parent (root/ancestor) of other nodes, then it is much more likely to also be storing the children. So when a requester attempts to pull down a large DAG, it first queries the DHT for providers of the root. Once the requester finds some and connects directly to retrieve the blocks, BitSwap will optimistically send them the “wantlist”, which will usually obviate any more DHT queries for that DAG.   

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

-   [Filecoin \[IOU\] (FIL)](https://www.coingecko.com/en/coins/filecoin "https://www.coingecko.com/en/coins/filecoin"), CoinGecko    

Unexpected trailing spaces found.
Open

-   Scalability - Not relying on location implies latency can become an issue ⇒ minimising communication.    

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

-   Freshness: After an object is initially acquired (processed), it may have to be periodically recrawled and checked for updates. In the simplest case, this could be done by starting another broad breadth-first crawl, or by requesting all items in the collection of a node again. Techniques for optimizing the “freshness” of such collections is usually based on observations about an item's update history (incremental crawling). Sadly, we cannot use this “as such” as an updated object in IPFS has a new hash. A variety of other heuristics can be used to recrawl as “more important” marked items. Good enough recrawling strategies are essential for maintaining an up-to-date search index with limited crawling bandwidth.    

Unexpected trailing spaces found.
Open

-   The **_Neighbourhood_** data store contains the identifiers of agents in a node's neighbourhood.        

Unexpected trailing spaces found.
Open

    -   Uses the local IPFS gateway to fetch a (named) IPFS resource.      

Expected a newline at the end of the file.
Open

### [Source](https://niverel.tymyrddin.space/en/play/stones/current/peer-crawling)

Unexpected trailing spaces found.
Open

-   Duplications are removed across the network. 

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

-   A _tree_ is a collection of blocks, lists, or other trees.    

Unexpected trailing spaces found.
Open

-   [Filecoin](https://filecoin.io/ "https://filecoin.io/")    

Unexpected trailing spaces found.
Open

-   An **_Overlay Network Layer_** responsible for formation and maintenance of a distributed search engine network, and communication between peers. These can be unstructured or structured networks. If a network is not scalable, a supernode architecture can be used to improve performance, hence a client must have support for flat as well as supernode architecture.    

Expected an indentation at 8 instead of at 9.
Open

         

Unexpected trailing spaces found.
Open

-   The **_Content Range Validator_** checks whether content lies in the range of the node.    

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

-   A **_Peer and Content Distribution Function_** determines which clients to connect with. Each client has a copy of this function. Hash list & content range associated with a client can change due to joining/leaving of nodes in the network. The function will distribute hashes to crawl as well as content among peers and makes use of the underlying dweb network to provide load balancing and scalability and takes proximity of nodes into account. Initially, we will use a static distribution function, hash list and content range assignment functions can be hash functions.    

Unexpected trailing spaces found.
Open

    

Unexpected trailing spaces found.
Open

         

Unexpected trailing spaces found.
Open

    -   Checks file permissions.        

There are no issues that match your filters.

Category
Status