Registry Database Normalization
Address Normalization and Cleaning System
The system is designed to support the management of large data archives, typically (but not only) customer databases.
Address normalization system consists of a standardized procedure, which includes the address reference tables at the national level and data-cleaning rules and can be customized to best meet specific customer needs.
The solution allows for processing in different ways:
- initial “mass”normalization;
- “incremental” normalization, i.e. related to new master data which are subsequently inserted or modified;
- normalization “on demand”, i.e. relative to a sample of master data selected by the operator (eg. master data either inserted, or modified, in a defined time interval).
Address normalization takes place according to the following logic:
- the city is recognized and normalized;
- as part of the city, the street is recognized and normalized;
- if the street number and the suffix of the number are entered in the queue of the street field, they are isolated and separated;
- consistency between the address and the postal code, which can be reset based on the priority rules, or added if absent, is checked.
The system involves the use of the Istat street maps, available in the latest update, with extension to the national level.
Also, if available in the registry, the system allows to check the formal correctness of the tax code and its consistency with other available information (name, surname, date and place of birth, sex), including, for foreigners, the country of birth, taken from from Belfiore code.
The system allows, as an option, to perform deduplication activities of the names present in the registry, and georeferencing.
System operational flow
The operational flow of the normalization system provides:
Extraction of the sample to work on from the customer database: the normalization engine works on a temporary database containing the master data to be normalized extracted from the source database in a predefined format.
Sample normalization: using the described logic, the normalization engine performs the normalization of addresses and generates a report that shows the data to be normalized and the normalized data. The report shows the normalized master data associated with a normalization “reliability” indicator, or the possible cause of a normalization error.
Post-normalization activities: at the end of processing, the user can control the outcome of the normalization by analyzing the report, and then accept or not the normalizations proposed by the system and activate a procedure to bring the accepted normalizations on the original database.
Optionally, at the end of normalization, a deduplication activity of the names present in the registry can be performed, based on the identity, or similarity, rules of the identification fields (name, surname, address, date of birth).
The deduplication procedure acts on a set of user-selected names.
The procedure operational flow involves the following steps:
- extraction from the sample database of the master data to be processed;
- generation of a report indicating the names that, on the basis of established rules, appear multiple times;
- activation, subject to the user’s consent, of a specific procedure to handle duplicates.
Se il Cliente necessita di attività di geomarketing, sono previste ulteriori elaborazioni, che vengono fornite in opzione:
- posizionamento del punto dell’indirizzo (coordinate di latitudine e longitudine);
- assegnazione della sezione Istat di appartenenza.
Per questi servizi è necessario il posizionamento geografico di un indirizzo comprensivo del numero civico, per cui si deve fare riferimento ad un idoneo database di georeferenziazione (licenza da acquisire).