Since the last decade of the 20th century, Decision Support Systems (DSS) have gained popularity and their use has increased in companies and organizations of different areas. Over the years, DSS have evolved and mutated into comprehensive Business Intelligence (BI) systems, in the context of Data Science Strategies. Since the origins of DSS and the growth of BI, business data lifecycle management, analysis and knowledge have been key to successful strategies.
Data is a fundamental component in the architecture of Business Intelligence systems. The incorporation of large volumes of data in the daily operations of companies (the trendy Big Data) helped Business Intelligence systems evolve, focusing their scope on data analysis and obtaining valuable information and knowledge for decision-making.
In today’s article we intend to understand the life cycle of data in an organization, in order to study how data “lives” from when we collect it until we use it for decision-making. This framework of data lifecycle management is useful to understand the importance of data and its correct management and administration in each of the parts of its life cycle.
The data lifecycle management process is closely related to the information systems process to support strategic and business decision-making. The data flow enters the company from certain internal and external sources, and needs to go through certain stages and components until it reaches its final stage of data visualization and presentation of results.
Step 1: Collection.
This is the stage where we obtain access to the data. We try to extract and collect data from various databases or sources internal and external to the organization. An interesting example that we want to tell you about is the case of the Voice Assistant Platform that we developed for SoundHound, which consisted on a voice recognition product for the Spanish-speaking market. In this case, we can see that the data is obtained through the recognition of the voice of the users.
We can talk about different types of software for data collection management. An alternative is on-premises software, that is, software that is installed on the local server without the need for access to the cloud, such as the Oracle and SAP platforms. On the other hand, there are also alternatives that are offered through cloud services, such as Microsoft Azure, Amazon Web Services, Google’s BigQuery, and Salesforce (SaaS).
The data collection methods, which we can take advantage of from the aforementioned platforms, are the following:
- Batch: When the software periodically connects to the data source in search of changes or updates since the last connection.
- Streaming: When the software is constantly connected to the data source, so that information, changes and updates impact the moment they are made.
Step 2: Storage.
Storage consists of saving the data that has been collected and keeping it protected until the moment it needs to be analyzed. The basic elements of the storage layer are databases and files. The former we use with relational database management systems (such as MySQL and SQL Server) since they allow us to store structured data. While the latter allows us to store unstructured data and, therefore, are used with non-relational database systems, also known as NoSQL.
Step 3: Processing and Analysis.
Processing refers to the moment of data transformation, preparation and enrichment, where the main objective is to clean and order the data to align what may bias or hinder subsequent analysis.
When we move on to the analysis, it is time for action! It is at this stage when the exploitation and analysis of the data allow us to find solutions to problems that may arise in the company.
The data is processed and analyzed using statistical analysis, software and programming languages.
Step 4. Visualization.
Through data visualization tools and correct communication, value is given to the data, allowing its understanding by those who need to make decisions based on it. In this stage we generate graphic visualizations of the important information to communicate solutions that have an impact. This last step represents the impact on the business, our goal is to generate a team with the business stakeholders to explain the solution.
For this, in addition to the use of visualization tools such as Power BI and Tableau, or visualization libraries such as Plotly or Seaborn in Python, there are tailor-made developments that can add extra value to decision-making. This is the case of the development of the General Service Survey that we carried out in Huenei for YPF, which consisted in a platform for conducting surveys in Aeropuertos Argentina, which automatically summarizes, analyzes and visualizes the information in an accessible and easy to understand way.
In the world of BI, the importance of Data Storytelling is growing as a practice focused on building a narrative for data and its visualizations. Its purpose is helping to convey the meaning of the findings to decision makers. This is a very interesting and exciting topic. We’ll talk in depth about it in an upcoming article!