Information integrity – A priority of data-driven organizations

Information integrity – A priority of data-driven organizations

Within the InsideView Alignment Report 2020, greater than 70% of income leaders rank information administration as the best precedence. Though, many organizations have applied a system for information assortment and evaluation, nonetheless, their greatest concern stays to keep up the integrity of their information.

The time period ‘information integrity’ is typically used as a course of or a state of information. Both approach, it refers to information being correct, legitimate, and constant, throughout all information sources.

In layman phrases, information integrity refers back to the information that your workforce can belief, really feel assured that’s protected, and use for no matter goal they need, with out worrying about information high quality.

These features are extraordinarily necessary, particularly for information analysts that combine and produce information collectively information from a number of sources to derive helpful insights and to retain clients. 

Forms of information integrity 

Information integrity has numerous features, however at a high-level, it may be divided into two sorts: bodily and logical. Each of those sorts outline numerous strategies and constraints that implement integrity in datasets.

Bodily information integrity 

Bodily information integrity pertains to defending information in opposition to exterior or bodily calamities, akin to energy outages, pure disasters, hackers, and so forth. These issues make it unimaginable for customers to entry information from the database, and are normally triggered by human errors, storage reductions, safety breaches, malware, and so forth.

Logical information integrity 

Logical information integrity pertains to how the information is saved and modelled inside the database, and the logical constraints applied to maintain the information correct, legitimate, and constant throughout a number of sources.

Logical information integrity is additional divided into 4 sorts:

Entity integrity 

Entity integrity means uniquely figuring out every entity in your database. This helps to keep away from duplicate data since each new report will need to have a singular identifier. These identifiers – additionally known as main keys in relational databases – can’t be null and is normally referenced in different datasets to stop information duplication. For instance, in a buyer database, SSN can be utilized because the distinctive identifier that ensures the entity integrity of the dataset.

Within the absence of uniquely figuring out attributes, advanced information matching and fuzzy matching algorithms are required to match information precisely and discover out which data belong to the identical entity.

Referential integrity 

Referential integrity refers back to the presence of international keys in a relational database. International keys are created to discuss with an present entity in one other desk. Relating data on this approach avoids creating duplicate report entries, and utilizing data from an present schema desk. For instance, an worker database can have worker data in a single desk, and job function data in one other. And international keys are used to narrate/point out a job function within the worker data desk.

An necessary factor to notice right here is {that a} desk has a singular, not-null record of main keys, however a number of distinctive data can share the identical international key (as a number of folks can have the identical job function within the group).

Area integrity 

Area integrity means right (domain-specific) values have been utilized in every column of the database. For instance, in an worker database the place handle data is added, the column Nation can have a listing of potential values, and any worth that doesn’t fall in that record is inaccurate and should be up to date and applied in a typical format(might be carried out by handle standardization).

Consumer-defined integrity 

When customers outline their very own customized guidelines or constraints on a column, it’s termed as user-defined integrity. For instance, if a consumer defines that the lead supply for prospects database might be: Google Adwords, Web site, or Chilly Name, then a worth outdoors of those three might be invalid.

Threats to information integrity 

The phrases ‘information integrity’ and ‘information safety’ are used interchangeably however they don’t seem to be the identical. Information safety measures are carried out to realize information integrity. Furthermore, sustaining information integrity is a fancy job and information safety is one solution to obtain information integrity, however there are extra, since numerous issues pose as threats to information integrity. A number of of them are highlighted beneath:

Human error 

Each 400 per 10,000 entries in a database are inaccurate resulting from human error. This can be a important quantity, and the implementation of distinctive identifiers, integrity constraints, and different validation checks are all overridden resulting from human errors.

Inconsistencies in information codecs 

With out correct information codecs and kinds outlined, information values inside the similar column are saved utilizing a distinct sample and format, which results in inconsistencies within the database. To forestall such inconsistencies, it is very important outline validation patterns and the proper information sorts.

Integration error 

Whereas integrating information from a number of sources, information integrity is normally compromised. The reason is the distinction in information construction, validation checks, and integrity constraints throughout every supply. One information might be saving Telephone Quantity as char information kind with max char restrict of 15, whereas the opposite is saving it as quantity information kind, with max char restrict of 13.

Inside privateness breaches 

This normally occurs when your information lands within the flawed palms, both an worker who’s misusing the corporate’s information repository, or hackers making an attempt to interrupt via your firewall to get to the information. In any case, securing the information from such privateness breaches is a vital job.

Indicators of information integrity 

To know whether or not your information has integrity, that you must search for the next indicators:


Is your information current on the proper place and is accessible every time wanted on the proper time? If there is no such thing as a correct or easy accessibility to your information, then your information could be on the danger of shedding its integrity. Sooner and optimized retrieval of information from database is a key signal that the info’s integrity is being maintained.


Do values of a column in your dataset have the identical information kind and format? Legitimate information is well seen by noticing what number of values in your database don’t conform to applicable validation checks, for instance, a creation date for a report having a worth ‘21.21.21’. The day and 12 months 21 make sense, however the month is invalid.


Does your database include a variety of null values? In case your dataset doesn’t have the report of sure values, then it’s higher to decide on a generic non-null time period (akin to Not supplied or N/A), moderately than leaving the column values empty. This may provide help to to grasp whether or not the values are lacking or incomplete, or have been intentionally left empty.


Do your dataset data uniquely determine entities? That is seen by assessing whether or not all data within the database mirror a singular identification and one entity data doesn’t span a number of data. In case your dataset accommodates duplicates, you’ll need to make use of information matching algorithms to determine which data belong to the identical entity. In case duplicates are non-exact, it’s possible you’ll require a mix of fuzzy matching algorithms to compute match confidence ranges, and decide.

Situations – the place information integrity turns into essential 

Now that we have now coated the fundamentals of information integrity, let’s focus on the real-world situations the place information integrity performs a big function.

Masking personally identifiable data 

A typical follow to cover personally identifiable data is to masks precise information with dummy information. This course of is extensively utilized in healthcare and different authorities institutes to guard particular person privateness. If information integrity will not be maintained throughout the dataset, it could possibly be very tough to realize the precise information again from the dummy information, for the reason that authentic information was inaccurate.

Guaranteeing compliance with information requirements 

Compliance requirements, akin to HIPPA, GDPR, and so forth. state the significance of information integrity. For instance, GDPR Article 5(1) states that private information needs to be:

“Correct and, the place vital, saved updated; each cheap step should be taken to make sure that private information which can be inaccurate, having regard to the needs for which they’re processed, are erased or rectified immediately;”

This clearly depicts how essential it’s to keep up information integrity in your database for complying to vital requirements.

Driving enterprise intelligence 

Dependable information insights are the most important advantage of capturing information and sustaining it in a clear, standardized method. Information analysts spend 80% of their time managing information and correcting information errors and solely 20% of the time in really analyzing and deriving insights from it. Organizations are more and more using techniques that course of their information and provides descriptive insights, however nonetheless discovering it tough to belief these insights. The rationale goes again to how the information is being captured, structured, and associated throughout tables within the database.

Conclusion – information integrity produces dependable insights 

On this article, we coated primary and superior features of information integrity, in addition to talked about a number of situations the place information integrity turns into essential. Though sustaining the integrity of your information looks like a resource- and time-intensive initiative, but it surely saves you time within the longer run, as your information-driven insights develop into extra dependable, correct, and actionable.


The publish Information integrity – A priority of data-driven organizations appeared first on Datafloq.


Please enter your comment!
Please enter your name here