Getting started with CWDC

Reading time: 5 minutes.

Introduction


Cogniware Data Collector (CWDC) is an enterprise, robust and scalable system for collecting information from social networks, websites, forums, e-mails, existing local data sources and various other sources for further analysis in specialized analytics software systems, such as IBM Watson Explorer Content Analytics or Cogniware GDPR Explorer.

How does it work

Each crawling job is represented by a Crawler and each Crawler uses Data Connectors to to retrieve data from source systems and Data Handlers to store the data on target systems.

Cogniware Data Collector Architecture


Components


CWDC Core:

Cogniware Data Collector Core is the heart of the system managing Connectors and Data Handlers by providing an extensible enterprise level configuration mechanism. Configurations such as Connector connection credentials, Crawler query configuration, scheduling information and Data Handler connection information are stored in the product built-in database.

The framework allows multiple parallel crawling sessions for different queries, where jobs can be defined as full crawl jobs (for initial crawling) or incremental jobs (when scheduled or run manually).

This approach enables collecting of structured and unstructured data such as social network posts, comments, user profiles or web pages, as separate entities while taking into account the relations between them. These relations are also stored as separate objects and have similar structure to the mapping table in relational database.

CWDC Connector

Connectors estabilish connection to source systems. For each particular social media, file system, database, Sharepoint Site, etc. a separate connector needs to be created. Typical information that needs to be provided is a URL or IP Address of the source system and credentials to connect.

CWDC Data Handler

Data Handlers handle where the data retrieved by Connector will be stored (and analyzed). These are typically internal systems like IBM Watson Explorer or Cogniware GDPR Explorer or it can be Filesystem or SQL Database.

Crawler

Crawlers are jobs defined by users and serve as definitions for crawling processes. Each crawler performs one crawling job and consists of connector, data handler and query definition.



CW Data Collector UI

Cogniware Data Collector User Interface is an HTML 5 WEB application. Users can use it to install and configure Connectors and Data Handlers and to create and manage Crawlers.

Next topic: Working with Application

Get me to: