Data is a vital resource for any organization. Managing business data requires a careful and standardized process. We have already discussed in previous articles the life cycle of data and how it can help your company in making business decisions. This is why today we propose to take another step into the world of data and understand what types of data companies like yours work with.
Database management problems are often related to tight behaviors in the organization. That is to say, inconveniences with the treatment of the data that arise from the use of outdated, inefficient technologies that consume many organizational resources. This translates into a high dependency between the programs used and the data, little flexibility in administration, difficulty in sharing data between applications or users, data redundancy, and poor information security.
But even in advanced technology companies, it is common to find the same limitation: staff does not understand the types of data they are working with and have difficulty transforming the data into key knowledge relevant for decision making. And with the advancement of Big Data in companies, these problems represent a loss of value for customers, employees, and stakeholders.
Data in companies: different structures.
Everyday companies collect (and generate) a lot of data and information. With the advancement of technology, data that lacks a defined structure became accessible and of great use for making business decisions; years ago, it was almost impossible to analyze these data in a standardized and quantitative way. Let’s see what the alternatives we face are:
- Structured data. They are traditional data, capable of being stored in tables made up of rows and columns. They are located in a fixed field of a specific record or file. The most common examples are spreadsheets and traditional databases (for example, databases of students, employees, customers, financial, logistics…).
- Semi-structured data. These do not follow a fixed and explicit scheme. They are not limited to certain fields, but they do maintain markers to separate items. Tags and other markers are used to identify some of its elements, but they do not have a rigid structure. We can mention XML and HTML documents, and data obtained from sensors as examples. Some other not-so-traditional examples that we could mention are the author of a Facebook post, the length of a song, the recipient of an email, and so on.
- Unstructured data. They are presented in formats that cannot be easily manipulated by relational databases. These are usually stored in data lakes, given their characteristics. Any type of unstructured text content represents a classic example (Word, PowerPoint, PDF files, etc.). Most multimedia documents (audio, voice, video, photographs) and the content of social media posts, emails, and so forth, also fall into this category.
How do I structure my data?
Beyond the level of structure discussed above, it is essential to your organization’s data management process that you can standardize its treatment and storage. For that, a fundamental concept is that of metadata: data about data. It sounds like a play on words, but we mean information about where data is used and stored, the data sources, what changes are made to the data, and how one piece of data refers to other information. To structure a database we have to consider four essential components: the character, the field, the record, and the file. So we can understand how our data is configured …
- A character is the most basic element of logical data. These are alphabetic, numeric, or other-type symbols that make up our data. For example, the name PAUL consists of four characters: P, A, U, L.
- The field is the grouping of characters that represents an attribute of some entity (for example, data obtained from a survey, from a customer data management system, or an ERP). Continuing with the previous example, the name PAUL would represent a complete field.
- The record is a grouping of fields. Represents a set of attributes that describe an entity. For example, in a survey, all responses from Paul (a participant) represent one record (also known in some cases as a “row”).
- Last but not least, a file is a group of related records. If we continue with Paul’s example, we could say that the survey data matrix is an example file (whether it is encoded in Excel, SQL, CSV, or whatever format it is). Files can be classified based on certain considerations. Let’s see some of them:
The application for which they are used (payroll, customer bases, inventories …). |
The type of data they include (documents, images, multimedia …). |
Its permanence (monthly files, annual sets …). |
Its possibility of modification (updateable files –dynamic, modifiable-, historical –means of consultation, not modifiable). |
As you have seen, the world of data is exciting and you can always continue learning concepts and strategies to take advantage of its value in your organization. To close this article and as a conclusion and example of the value of data for companies, we want to invite you to learn about a project in which we work for one of our clients. The General Service Survey that we develop for Aeropuertos Argentinos is an application of the entire life cycle of data (from its creation to its use) and is fed with data of different levels of structure. It is about the development of a platform to carry out surveys to visitors and employees, together with the analysis and preparation of automated reports. Don’t miss this case study!