Annotator configuration

Reading time: 10 minutes

This article explains how to configure what types of personal information from which regions and coutries will be recognized and marked as personal information. It also explains how to force structured data source annotation (you can specify what do certain DB tables contain to help the annotation accuracy).

Prerequisities

You need full Enterprise license to be able to access this functionality.

You need to have the GDPR Explorer sucessfully installed and started for the first time.

Please get yourself familiar with the GDPR Explorer administration guidelines first.

Adding configuration files

Upon request you will recieve from your implementation partner or from Cogniware the default order_enabled_annotator.json and structured_annotator_configuration.json.

Copy these file to this location (on the server hosting GDPR Explorer):

/var/dockershare/annotator/annotator-settings

Limit the annotation of specific personal information from region

1. Backup the systems if you are making changes to production environment.


2. Open the file order_enabled_annotator.json for editing.

vim /var/dockershare/annotator/annotator-settings/order_enabled_annotator.json

The order_enabled_annotator.json is JSON file and must maintain the JSON structure. Please take special caution to last listed country that does not have "," at the end of the line (see examples below).

The file is devided per annotation types (phone numbers, names, etc.).

3. To disable the annotation of specific type from specific region (country) - delete the coresponding line in the order_enabled_annotator.json file

Example to disable annotation of Italian and British car plate numbers, delete the lines 17 and 31 from the example below:

{
		"annotatorName": "CAR_PLATE_NUMBER_ANNOTATION_NAME",
		"country": [
			"AT",
			"BE",
			"BG",
			"HR",
			"CY",
			"CZ",
			"DK",
			"EE",
			"FI",
			"FR",
			"DE",
			"GR",
			"HU",
			"IE",
			"IT",
			"LV",
			"LT",
			"LU",
			"MT",
			"NL",
			"PL",
			"PT",
			"RO",
			"SK",
			"SI",
			"ES",
			"SE",
			"UK"
		]
	},

You must maintain the JSON structure of the file especialy mind the last listed country does not have "," at the end of the line!

Result:

{
		"annotatorName": "CAR_PLATE_NUMBER_ANNOTATION_NAME",
		"country": [
			"AT",
			"BE",
			"BG",
			"HR",
			"CY",
			"CZ",
			"DK",
			"EE",
			"FI",
			"FR",
			"DE",
			"GR",
			"HU",
			"IE",
			"LV",
			"LT",
			"LU",
			"MT",
			"NL",
			"PL",
			"PT",
			"RO",
			"SK",
			"SI",
			"ES",
			"SE"
		]
	},

4. Restart the system

cd ~/GDPRExplorer 
sudo docker-compose stop 
sudo docker container prune
sudo docker-compose up -d 

Force structured data annotation


Open the file structured_annotator_configuration.json for editing.

vim /var/dockershare/annotator/annotator-settings/structured_annotator_configuration.json

The file is structured per Crawler type, subtype and Crawler ID.

The pairs inside Crawler specify how individual columns from source data will be annotated.

Example “phone”:“phone_number”, translates into: annotate column “phone” as “phone_number”.

Following annotations are currently supported out of the box:

  • address_full
  • age
  • bank_account
  • birth_date
  • car_plate_number
  • company_registration_number
  • credit_card
  • driving_licence_number
  • education
  • email
  • gender
  • gps
  • iban
  • identity_card_number
  • imei
  • ip_address
  • passport_number
  • person_name
  • personal_identification_number
  • personal_status
  • phone_number
  • tax_identification_number

"default" is the standard setup for unspecified Crawlers of the particular type.

"RUN_ALL_ANNOTATORS" tag means that this particular column should not be forced but annotated as standard.

{
	"mail": {
		"imap": {
			"default": {
				"address_from": "email",
				"subject": "RUN_ALL_ANNOTATORS",
				"addresses_to": "email",
				"address_to": "email"
			},
			"8d9edcef-6d3a-4fc7-89b2-0379b155cf6d": {
				"address_to": "something_different"
			}
		}
	},
	"exchange": {
		"online": {
			"default": {
				"address_from": "email",
				"subject": "RUN_ALL_ANNOTATORS",
				"addresses_to": "email"
			},
			"4b5f3b6b-2bca-4635-9180-aae2f8767d69": {
				"address_to": "phone_number"
			}
		},
		"2016": {
			"default": {
				"address_from": "email",
				"subject": "RUN_ALL_ANNOTATORS",
				"addresses_to": "email"
			},
			"4b5f3b6b-2bca-4635-9180-aae2f8767d69": {
				"address_to": "phone_number"
			}
		}
	},
	"sharepoint": {
		"online": {
			"default": {
				"address_to": "person_name",
				"phone": "phone_number"
			},
			"9d0f57b1-749a-4074-be55-954a4d3c5935": {
				"address_to": "address_full"
			}
		},
		"2016": {
			"default": {
				"something": "email"
			}
		}
	},
	"Database": {
		"nosubtype": {
			"default": {
				"notUsedForDatabase": "DoNotDelete"
			},
			"c5de1ba0-9fa3-4080-b5a7-d5e772bebb70view_name": {
				"address_full": "address_full",
				"age": "age",
				"account_no": "bank_account",
				"date_of_birth": "birth_date",
				"car_plate_number": "car_plate_number",
				"company_registration_number": "company_registration_number",
				"credit_card": "credit_card",
				"driving_licence_number": "driving_licence_number",
				"education": "education",
				"email": "email",
				"gender": "gender",
				"gps": "gps",
				"iban": "iban",
				"identity_card_number": "identity_card_number",
				"imei": "imei",
				"ip": "ip_address",
				"passport_number": "passport_number",
				"person_name": "person_name",
				"nino": "personal_identification_number",
				"personal_status": "personal_status",
				"phone": "phone_number",
				"tax_identification_number": "tax_identification_number",
				"defaultIECAction": "RUN_ALL_ANNOTATORS"
			}
		}
	}
}

Get me there: