Ab initio is one of the most popular tools used to extract, transform and load data. This article will look at a few ab initio questions that you can expect in your upcoming interview to help you prepare well and increase your chances of landing the job. Take a look at the following:
1. Define Ab initio?
Ab initio is a common Latin word that means the beginning. However, for purposes of data, it is a helpful tool in extracting, transforming, and loading data. Ab initio also plays a great role in data analysis, manipulation, and any graphical user interface-based parallel processing. Therefore, ab initio is normally heavily relied on in data warehousing.
2. Can You Define Ab Initio’s Architecture
Ab initio comprises four main parts: Conduct>IT, co-operating system, enterprise meta environment, and graphical development environment, popularly known as GDE. The Conduct>it offers sophisticated job automation and monitoring for all Ab Initio applications and programs, enabling linkage to a standard execution environment. The enterprise meta environment normally links operational metadata to business metadata, while the graphical development environment facilitates operations building. All these components play a major role in Ab Initio.
3. Can You Explain What Co-Operating System Does in Ab Initio
The Ab Initio Co-Operating system plays several essential roles, which explains why it is a major tool component. First, it manages and runs the Ab Initio graph, thus controlling the extraction, transformation, and loading processes. Secondly, it offers all the necessary Ab Initio extensions to the operating system, which is an equally important role. Thirdly, it facilitates metadata management and its interaction with EME. Lastly, a Co-Operating system in Ab Initio manages the monitoring and debugging processes in ETL.
4. Do You Know the Different Types of Parallelism Used in Ab Initio?
There are three different types of parallelism Used in Ab Initio. First, we have component parallelism, which involves a graph with several processes executing at the same time but on different data. Secondly, we have the data parallelism involving a graph relying on segmented data. Here, it uses data parallelism to operate on each segment. Lastly, there exists pipeline parallelism where the said graph deals with several simultaneously executing components. The only difference between pipeline and component parallelism is that these components execute on the same data in the former.
5. Can You Tell Us about Partition in Ab Initio?
Partition in Ab Initio refers to dividing data sets into smaller multiple groups to aid in further processing. There are several types of data partition in Ab Initio, such as partition by range where data is equally divided among different nodes based on a given set of partitioning ranges and key; partition by percentage where information is distributed in a way that guarantees all the sections are equal to fractions of 100; partition by key where data is grouped by key; partition by expression where the division is guided by a DML expression, partition by load balance which follows dynamic load balancing and partition by round-robin where data is divided equally in block size chunks.
6. You Mentioned Extensions a While Back. Can You Tell Us Some of the File Extensions Used in Ab Initio?
There are six main file extensions used in Ab Initio. First, we have the .mp extension, which stores the graph component in Ab Initio. The .MPC file extension is the custom component program; .data stands for data file, which can either be a multi or serial file; .xfr is the transform function file extension; .mdc is the dataset or custom data set component and lastly, the .dml file extension is the data manipulation language file or record type definition. The co-operating system provides all these.
7. How Does the .dbc file Help Users Connect to a Database?
The .dbc file we mentioned above offers crucial information to the graphical development environment to facilitate connection to the database. These include the name and date version of the intended database, the server’s name, database instant or the intended provider, and lastly, the name of the mother computer hosting the database. Alternatively, it can be the name of the computer housing the database remote access software.
8. Do You Know What De-Partitioning Means in Ab Initio?
De-partitioning is quite different from partitioning. It is normally done to enable access and reading of data from several flows or operations. It also rejoins data records from various flows. Ab Initio, therefore, has several de-partition components such as merge, gather, interleave, and concatenation. The gather component collects several serial flows before coming up with a single flow.
In contrast, merge collects data records from different flow partitions once they have been sorted bases in a similar key specifier. Interleave brings together data record blocks from several flows partitions round-robin. Lastly, the concatenation component combines the three different types of data flows in a successive fashion.
9. Tell Us More about the Sort Component in Ab Initio
The set component plays an important role in Ab Initio. It facilitates the re-ordering of data thanks to its key and max-core parameters. The key component determines the collation order of the data, while the max core component controls the frequency of data dump from the memory to the disk by the sort component.
10. What Do the Dedup and Replicate Components Do in Ab Initio?
The Dedup component in Ab initio helps in sorting. It normally separates a given data record where specific records exist. This component removes duplicates from a given final MFS file. The replicate record is quite different from the former. Instead of removing it, it merges the data records from various inputs to a single flow before writing the copy’s flow to every output port. Therefore, these two components play important roles in Ab Initio, as illustrated by this definition.
11. Do You Know How to Defragment a Data Table?
There are a number of process steps that facilitate the defragmentation of a data table. One is moving the table in a similar order as another tablespace and rebuilding all the table indexes, which will help to reclaim the defragmented space in the table. The second means is performing a Reorg process where one takes a dump of the table and truncates it before importing the dump back into the table. When done right, these two means make it relatively easy to defragment any table one may come across.
12. Tell Us about the Order of Parameter Evaluation
There are seven steps to be followed when evaluating parameters in Ab Initio. All these must be done in the right way for the process to succeed. We first have to execute the host setup script before evaluating all the common parameters. Next, the sandbox parameters should be evaluated, followed by the project script-project-start.ksh. The fifth step is to execute the form parameters. After succeeding, the graph parameters are then performed, followed swiftly by the start of the graph. Everyone should always follow this order for a successful evaluation process.
13. How Can One Improve Graph Performance?
One of the upsides of graphs is that we can easily improve their performance in several ways, mainly by ensuring that only a few components are used in any given phase. Other means include using only the required fields in join, reformat and sort components; using phasing/flow buffers to merge and sort joins; sticking to the optimum quantity of max core values when sorting and joining elements and minimizing the use of sorted joins by replacing them with hash joins where needed. Sort components should also be minimized for graph performance improvement. Lastly, where there are two huge inputs, one should use sorted joins.
14. Are You Aware of the Different Types of Data Processing Types Available?
There are several available types of data processing. First, we have the manual data approach, where data is processed without using a machine. It normally has several errors that can be prevented through the use of a machine. This explains why this process is rarely used nowadays, and if used, only limited data gets processed. The second type is mechanical data processing, where mechanical devices are used. It also comes in handy when the data exists in different formats. Lastly, we have electronic data processing, which is faster than the other options. It is accurate and more reliable.
15. Why are People Advised to Use Roll Up When the Aggregate Component in Ab Initio Can Perform the Same Purpose?
Both aggregation and roll-up are used to summarize data. However, roll-up has a number of benefits that make it better than the former. It is generally better and more convenient to use. It also offers additional functionalities such as input and output filtering for record purposes. The aggregate component cannot display intermediate results in the primary memory, whereas roll-up can. Lastly, it is easier to analyze a given summarization using roll-up.
16. Is there a Relationship Between EME, GDE, and the Co-operating System?
Yes. EME stands for enterprise metadata environment, while GDE means graphical development environment. On the other hand, the Co-operating system supplies all the needed extensions in Ab Initio. It is normally installed in a specific operating system (native O.S) emanating from the EME, making it an Informatica repository. The GDE, being an end-user environment, allows for the development of graphs, which are normally saved on the EME or sandbox located on the user side.
17. Do You Know What Data Processing is? Should Businesses Depend on it?
Data processing is a form of data transformation. It converts raw data into useful data by doing away with the useless parts or inaccuracies, either electronically, manually, or mechanically. This normally varies depending on the size and format of the data. In data processing, several operations are conducted on the data. This process allows users to convert and access data in the form of tables, graphs, charts, images, and vectors. Therefore, businesses should take up and appreciate data processing since they stand to gain a lot from it.
18. Do You Think Data Processing Is Important?
Yes. Data processing is important given that data is obtained from various sources and therefore may have huge variations. Thus, data must be deeply analyzed and cleansed before storage, which necessitates data processing. It saves lots of time which an organization can then channel to more important things. Data processing also saves institutions or several users the stress of depending on a number of factors to facilitate reliable operations. Therefore, to answer whether data processing is important or not, I would say yes.
19. Can You Tell Us about the Different Types of Joins in Ab Initio
There are several types of joins in Ab Initio, based on the match key for inputs. The most common is the inner join which necessitates the calling of the transform function when each input port has a record with a value similar to that of the key fields. The full outer join similarly applies the transform function; just that NULL often substitutes the missing records. The other types of joins are explicit and semi-joins, all with different roles. All these are important in Ab Initio.
20. Define Dependency Analysis in Ab Initio
Dependency analysis is one of the most important processes in ETL-Ab Initio’s main purpose. It helps the enterprise meta environment, the source code control system in Ab initio, critically explore a project and trace data transfer and transformation from one component to the other. This is also normally done by field to field basis between and within the graphs. Therefore, as the name suggests, this process helps trace every field’s dependency in a given dataset.
21. Can You Define a Sandbox
A sandbox is a controlled environment in Ab Initio that facilitates the operation of development programs. It can also be defined as an array of graphs and different files stored, hence found in a single directory tree. These files and graphs are usually treated as a group to enhance version control, migration, and navigation. The sandbox is important, particularly because graphs shouldn’t be run in environments that aren’t safe or controlled. A sandbox can also be defined as the work area for a user in Ab Initio. Remember, it can be created using different Ab Initio components.
22. Tell Us About Data Encoding
Data Encoding is an important approach that helps keep data safe. Like with codes, data encoding ensures that the information stays in a way that the sender or receiver can understand. It, therefore, helps protect data by using machine language. Data encoding also uses data schemes to store and retrieve any useful information. All the data is usually serialized by the computer or converted into a bunch of zeros or ones before transmission over different mediums. All in all, this process transforms a given set of data or symbols into different specified formats to ensure their security.
23. Can You Tell the Difference Between a Check Point and a Phase?
A checkpoint is the recovery point created when a graph fails in the middle of a process. The process does not stop entirely as the rest goes on after the checkpoint. As for the data obtained on the checkpoint, it is normally contained and executed after a correction has been made. On the other hand, phases are assigned to given memories in successive order whenever they are used to create a graph. They also run in the same order, just that the intermediate file is normally done away with.
24. Are You a Good Team Player?
Yes. I am positive that I am an excellent team player and an asset in the teams I find myself in. Over the years, I have motivated and rallied fellow team members towards meeting different goals and objectives. I believe that this is made possible by my people skills which help me interact well with others, put across information effectively, and respect as well as maintain boundaries. I understand just how important teamwork is in data manipulation and will be ready to do everything possible to be a quality team member. I am sure that I will perform exceedingly well if given a chance.
25. Take Us Through How One can Add Default Rules in Transformer
There are a number of stages that one should adhere to when adding default rules in a transformer. First, ensure that you double-click on the transform parameter. This is usually found in the parameter tab page in the component properties. On the transform editor, click on the Transform menu and select the add default rules from the available drop-down list box. You will then see match names and wildcard options, which you can choose depending on your preference.
Conclusion
This marks the end of our discussion. 95% of these questions are technical in nature, and therefore, you have to prepare adequately for your interview. Ensure that you also work on your confidence and articulation before the interview. We wish you all the best in your upcoming interview.