Real Time Teradata Interview Questions with Answers PDF
1. What Is Basic Teradata Query Language?
1. It allows us to write SQL statements along with BTEQ commands. We can use BTEQ for importing, exporting and reporting purposes.
2. The commands start with a (.) dot and can be terminated by using (;), it is not mandatory to use (;).
3. BTEQ will assume any thing written with out a dot as a SQL statement and requires a (;) to terminate it.
2. What Is The Difference Between Fastload And Multiload?
FastLoad uses multiple sessions to quickly load large amount of data on empty table. MultiLoad is used for high-volume maintenance on tables and views. It works with non-empty tables also. Maximum 5 tables can be used in MultiLoad.
3. Which Is Faster Fastload Or Multiload?
4. Difference Between Inner Join And Outer Join?
An inner join gets data from both tables where the specified data exists in both tables. An outer join gets data from the source table at all times, and returns data from the outer joined table ONLY if it matches the criteria.
5. What Is Multi Insert?
Inserting data records into the table using multiple insert statements. Putting a semi colon in front of the key word INSERT in the next statement rather than terminating the first statement with a semi colon achieves it.
Insert into Sales “select * from customer”;
Insert into Loan “select * from customer”;
6. Is Multi Insert Ansi Standard?
7. How Do You Create A Table With An Existing Structure Of Another Table With Data And With No Data?
Create table Customerdummy as Customer with data / with no data;
8. What Is The Opening Step In Basic Teradata Query Script?
.Logon tdipid/username, password.
9. Can You Fastexport A Field, Which Is Primary Key By Putting Equality On That Key?
10. Did You Write Stored Procedures In Teradata?
No, because they become a single amp operation and my company didn’t encourage that.
11. What Is The Use Of Having Index’s On Table?
For faster record search.
12. Is It Necessary To Add? Quit Statement After A Bteq Query When I Am Calling It In A Unix Environment?
Not necessary but it is good to add a QUIT statement after a query.
13. There Is A Column With Date In It. If I Want To Get Just Month How It Can Be Done? Can I Use Sub String?
Sub string is used with char fields. So it cannot be used. To extract month from a date column, ex select extract (month from ). Same thing for year or day. Or hour or minutes if it’s a time stamp (select extract (minute from column name).
14. What’s The Syntax Of Sub String?
SUBSTRING (string_expression, n1 [n2]).
15. While Creating Table My Dba Has Fallback Or No Fallback In His Ddl. What Is That?
FALLBACK requests that a second copy of each row inserted into a table be stored on another AMP in the same cluster. This is done when AMP goes down or disk fails.
16. My Table Got Locked During Mload Due To A Failed Job. What Do I Do To Perform Other Operations On It?
Using RELEASE MLOAD. It removes access locks from the target tables in Teradata. It must be entered from BTEQ and not from MultiLoad. To proceed, you can do RELEASE MLOAD
17. How To Find Duplicates In A Table?
Group by those fields and select id, count(*) from table group by id having count (*) > 1.
18. How Do You Verify A Complicated Sql?
I use explain statement to check if the query is doing what I wanted it to do.
19. How Many Tables Can You Join In V2r5?
Up to 64 tables.
20. How Do You See A Ddl For An Existing Table?
By using show table command.
21. Which Is More Efficient Group By Or Distinct To Find Duplicates?
With more duplicates GROUP BY is more efficient, if only a few duplicates exist DISTINCT is more efficient.
22. Syntax For Case When Statement?
CASE value_expression_1 WHEN value_expression_n THEN scalar_expression_n
23. What’s The Difference Between Timestamp (0) And Timestamp (6)?
TIMESTAMP (0) is CHAR (19) and TIMESTAMP (6) is CHAR (26)
Everything is same except that TIMESTAMP (6) has microseconds too.
24. How Do You Determine The Number Of Sessions?
Teradata performance and workload.
Client platform type, performance and workload.
Channel performance for channel attached systems.
Network topology and performance for network attached systems.
Volume of data to be processed by the application.
25. What Is Node? How Many Nodes And Amps Used In Your Previous Project?
Node is a database running in a server. We used 318 nodes and each node has 2 to 4 AMPS.
26. What Is A Clique?
Clique is a group of disk arrays physically cabled to a group of nodes.
27. What Is The Purpose Of Indexes?
An index is a mechanism that can be used by the SQL query optimizer to make table access more performant. Indexes enhance data access by providing a moreor- less direct path to stored data and avoiding the necessity to perform full table scans to locate the small number of rows you typically want to retrieve or update.
28. What Is Primary Index And Secondary Index?
Primary index is the mechanism for assigning a data row to an AMP and a location on the AMP’s disks. Indexes also used to access rows from a table without having to search the entire table. Secondary indexes enhance set selection by specifying access paths less frequently used than the primary index path. Secondary indexes are also used to facilitate aggregate operations.
If a secondary index covers a query, then the Optimizer determines that it would be less costly to accesses its rows directly rather than using it to access the base table rows it points to. Sometimes multiple secondary indexes with low individual selectivity can be overlapped and bit mapped to provide enhanced.
29. What Are The Things To Be Considered While Creating Secondary Index?
Creating a secondary index causes Teradata to build a sub-table to contain its index rows, thus adding another set of rows that requires updating each time a table row is inserted, deleted, or updated. Secondary index sub-tables are also duplicated whenever a table is defined with FALLBACK, so the maintenance overhead is effectively doubled.
30. What Is Collect Statistics?
Collects demographic data for one or more columns of a table, hash index, or join index, computes a statistical profile of the collected data, and stores the synopsis in the data dictionary. The Optimizer uses the synopsis data when it generates its table access and join plans.
31. Can We Collect Statistics On Multiple Columns?
Yes we can collect statistics on multiple columns.
32. Can We Collect Statistics On Table Level?
Yes we can collect statistics on table level. The syntax is COLLECT STAT ON TAB_A;
33. What Is Inner Join And Outer Join?
An inner join gets data from both tables where the specified data exists in both tables. An outer join gets data from the source table at all times, and returns data from the outer joined table ONLY if it matches the criteria.
34. When Tpump Is Used Instead Of Multiload?
TPump provides an alternative to MultiLoad for the low volume batch maintenance of large databases under control of a Teradata system. Instead of updating Teradata databases overnight, or in batches throughout the day, TPump updates information in real time, acquiring every bit of data from the client system with low processor utilization. It does this through a continuous feed of data into the data warehouse, rather than the traditional batch updates. Continuous updates results in more accurate, timely data. And, unlike most load utilities, TPump uses row hash locks rather than table level locks. This allows you to run queries while TPump is running. This also means that TPump can be stopped instantaneously. As a result, businesses can make better decisions that are based on the most current data.
35. What Is Spool Space And When Running A Job If It Reaches The Maximum Spool Space How You Solve The Problem?
Spool space is used to hold intermediate rows during processing, and to hold the rows in the answer set of a transaction. Spool space reaches maximum when the query is not properly optimized. Use appropriate conditions in WHERE clause of the query to limit the answer set.
36. What Is Data Mart?
A data mart is a special purpose subset of enterprise data used by a particular department, function or application. Data marts may have both summary and details data, however, usually the data has been pre aggregated or transformed in some way to better handle the particular type of requests of a specific user community. Data marts are categorized as independent, logical and dependant data marts.
37. Difference Between Star And Snowflake Schemas?
Star schema is De-normalized and snowflake schema is normalized.
38. Why Are Oltp Database Designs Not Generally A Good Idea For A Data Warehouse?
OLTP designs are for real time data and they are not normalized and preaggregated. They are not good for decision support systems.
39. What Type Of Indexing Mechanism Do We Need To Use For A Typical Data Warehouse?
Primary Index mechanism is the ideal type of index for data warehouse.
40. What Is Real Time Data Warehousing?
Real-time data warehousing is a combination of two things:
real-time activity and
Real-time activity is activity that is happening right now. The activity could be anything such as the sale of widgets. Once the activity is complete, there is data about it. Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes available instantly. In other words, real-time data warehousing is a framework for deriving information from data as the data becomes available.
41. What Is Ods?
An operational data store (ODS) is primarily a “dump” of relevant information from a very small number of systems (often just one) usually with little or no transformation. The benefits are an ad hoc query database, which does not affect the operation of systems required to run the business. ODS’s usually deal with data “raw” and “current” and can answer a limited set of queries as a result.
42. What Is Real Time And Near Real Time Data Warehousing?
The difference between real time and near real time can be summed up in one word: latency. Latency is the time lag that is between an activity completion and the completed activity data being available in the data warehouse. In real time, the latency is negligible whereas in near real time the latency is a tangible time frame such as two hours.
43. What Are Normalization, First Normal Form, Second Normal Form And Third Normal Form?
Normalization is the process of efficiently organizing data in a database. The two goals of the normalization process are eliminate redundant data (storing the same data in more than one table) and ensure data dependencies make sense (only storing related data in the table).
First normalization form:
Eliminate duplicate columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (primary key).
Second normal form:
Removes sub set of data that apply to multiple rows of table and place them in separate table.
Create relationships between these new tables and their predecessors through the use of foreign keys.
Third normal form:
Remove column that are not dependent upon the primary key.
44. What Is Fact Table?
The centralized table in a star schema is called as FACT table i.e; a table in that contains facts and connected to dimensions. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. A fact table might contain either detail level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called as Factless Fact tables.
45. What Is Etl?
Extract, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database. The first step in ETL process is mapping the data between source systems and target database (data warehouse or data mart). The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then loading into the target system. Note that ETT (extract, transformation, transportation) and ETM (extraction, transformation, move) are sometimes used instead of ETL.
46. What Is Er Diagram?
It is Entity relationship diagram. Describes the relationship among the entities in the database model.
47. What Is Data Mining?
Analyzing of large volumes of relatively simple data to extract important trends and new, higher level information. For example, a data-mining program might analyze millions of product orders to determine trends among top-spending customers, such as their likelihood to purchase again, or their likelihood to switch to a different vendor.
48. What Is Star Schema?
Star Schema is a relational database schema for representing multi-dimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. It is called a star schema because the entityrelationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. The center of the star schema consists of a large fact table and it points towards the dimension tables. The advantages of star schema are slicing down, performance increase and easy understanding of data.
49. What Is A Level Of Granularity Of A Fact Table?
The components that make up the granularity of the fact table correspond directly with the dimensions of the data model. Thus, when you define the granularity of the fact table, you identify the dimensions of the data model. The granularity of the fact table also determines how much storage space the database requires. For example, consider the following possible granularities for a fact table:
• Product by day by region
• Product by month by region
The size of a database that has a granularity of product by day by region would be much greater than a database with a granularity of product by month by region because the database contains records for every transaction made each day as opposed to a monthly summation of the transactions. You must carefully determine the granularity of your fact table because too fine a granularity could result in an astronomically large database. Conversely, too coarse granularity could mean the data is not detailed enough for users to perform meaningful queries against the database.
50. What Is A Dimension Table?
Dimension table is one that describes the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products. Dimension tables are sometimes called lookup or reference tables. In a relational data modeling, for normalization purposes, country lookup, state lookup, county lookup, and city lookups are not merged as a single table. In a dimensional data modeling (star schema), these tables would be merged as a single table called LOCATION DIMENSION for performance and slicing data requirements. This location dimension helps to compare the sales in one region with another region. We may see good sales profit in one region and loss in another region. If it is a loss, the reasons for that may be a new competitor in that area, or failure of our marketing strategy etc.
51. What Are The Various Reporting Tools In The Market?
Crystal reports, Business objects, micro strategy and etc.,
52. What Are The Various Etl Tools In The Market?
Ab Initio, Informatica and etc.
53. What Is A Three-tier Data Warehouse?
The three-tier differs from the two-tier architecture by strictly enforcing a logical separation of the graphical user interface, business logic, and data. The three-tier is widely used for data warehousing today. Organizations that require greater performance and scalability, the three-tier architecture may be more appropriate. In this architecture, data extracted from legacy systems is cleansed, transformed, and stored in high –speed database servers, which are used as the target database for front-end data access.
54. Differentiate Primary Key And Partition Key?
Primary Key is a combination of unique and not null. It can be a collection of key values called as composite primary key. Partition Key is a just a part of Primary Key. There are several methods of partition like Hash, DB2, and Random etc. While using Hash partition we specify the Partition Key.
55. Differentiate Database Data And Data Warehouse Data?
Data in a Database is Detailed or Transactional, Both Readable and Write able and current.
Data in data warehouse is detailed or summarized, storage place for historical data.
56. What Is Oltp?
OLTP stands for Online Transaction Processing. OLTP uses normalized tables to quickly record large amounts of transactions while making sure that these updates of data occur in as few places as possible. Consequently OLTP database are designed for recording the daily operations and transactions of a business. E.g. a timecard system that supports a large production environment must record successfully a large number of updates during critical periods like lunch hour, breaks, startup and close of work.
57. What Is Staging Area?
The data staging area is a system that stands between the legacy systems and the analytics system, usually a data warehouse and sometimes an ODS. The data staging area is considered the “back room” portion of the data warehouse environment. The data staging area is where the extract, transform and load (ETL) takes place and is out of bounds for end users. Some of the functions of the data staging area include:
58. Extracting data from multiple legacy systems.
Cleansing the data, usually with a specialized tool.
Integrating data from multiple legacy systems into a single data warehouse.
Transforming legacy system keys into data warehouse keys, usually surrogate keys.
Transforming disparate codes for gender, marital status, etc., into the data warehouse standard.
Transforming the heterogeneous legacy data structures to the data warehouse data structures.
Loading the various data warehouse tables via automated jobs in a particular sequence through the bulk loader provided with the data warehouse database or a third-party bulk loader.
59. What Is Subject Area?
Subject area means fundamental entities that make up the major components of the business, e.g. customer, product, employee.
60. What Is A Checkpoint?
Checkpoints are entries posted to a restart log table at regular intervals during the data transfer operation. If processing stops while a job is running, you can restart the job at the most recent checkpoint.
61. What Is Slowly Changing Dimension?
In a slowly changing dimension the attribute for a record varies over time. There are three ways to solve this problem.
• Type 1 – Replace an old record with a new record. No historical data available.
• Type 2 – Keep the old record and insert a new record. Historical data available but resources intensive.
• Type 3 – In the existing record, maintain extra columns for the new values.
62. Difference Between Multiload And Tpump?
Tpump provides an alternative to MultiLoad for low volume batch maintenance of large databases under control of a Teradata system. Tpump updates information in real time, acquiring every bit of a data from the client system with low processor utilization. It does this through a continuous feed of data into the data warehouse, rather than the traditional batch updates. Continuous updates results in more accurate, timely data. Tpump uses row hash locks than table level locks. This allows you to run queries while Tpump is running.
63. Different Phases Of Multiload?
• Preliminary phase.
• DML phase.
• Acquisition phase.
• Application phase.
• End phase.
64. What Is Dimensional Modeling?
Dimensional Data Modeling comprises of one or more dimension tables and fact tables. Good examples of dimensions are location, product, time, promotion, organization etc. Dimension tables store records related to that particular dimension and no facts (measures) are stored in these tables.
65. How Will You Solve The Problem That Occurs During Update?
When there is an error during the update process, an entry is posted in the error log table. Query the log table and fix the error and restart the job.
66. Can You Connect Multiload From Ab Initio?
Yes we can connect.
67.What Interface Is Used To Connect To Windows Based Applications?
68. What Is Data Warehousing?
A data warehouse is a subject oriented, integrated, time variant, non-volatile collection of data in support of management’s decision-making process.
69. What Is Data Modeling?
A Data model is a conceptual representation of data structures (tables) required for a database and is very powerful in expressing and communicating the business requirements.
70. What Is Logical Data Model?
A Logical data model is the version of a data model that represents the business requirements (entire or part) of an organization and is developed before the physical data model. A sound logical design should streamline the physical design process by clearly defining data structures and the relationships between them. A good data model is created by clearly thinking about the current and future business requirements. Logical data model includes all required entities, attributes, key groups, and relationships that represent business information and define business rules.
71. Steps To Create A Data Model?
Get business requirements.
Create High Level Conceptual Data Model.
Create Logical Data Model.
Select target DBMS where data-modeling tool creates the physical schema.
Create standard abbreviation document according to business standard.
72. What Is The Maximum Number Of Dml Can Be Coded In A Multiload Script?
Maximum 5 DML can be coded in a MultiLoad script.
73. Does Sdlc Changes When You Use Teradata Instead Of Oracle?
If the teradata is going to be only a data base means It won’t change the System development life cycle (SDLC).
If you are going to use the teradata utilities then it will change the Architecture or SDLC.
If your schema is going to be in 3NF then there won’t be huge in change.
74. How Many Codd’s Rules Are Satisfied By Teradata Database?
There are 12 codd’s rules applied to the teradata database.
75. How Teradata Makes Sure That There Are No Duplicate Rows Being Inserted When Its A Set Table?
Teradata will redirect the new inserted row as per its PI to the target AMP (on the basis of its row hash value), and if it find same row hash value in that AMP (hash synonyms) then it start comparing the whole row, and find out if duplicate. If its a duplicate it silently skips it without throwing any error.
76. What Is The Difference Between Global Temporary Tables And Volatile Temporary Tables?
Global Temporary tables (GTT) –
1. When they are created, its definition goes into Data Dictionary.
2. When materialized data goes in temp space.
3. thats why, data is active upto the session ends, and definition will remain there upto its not dropped using Drop table statement.
If dropped from some other session then its should be Drop table all;
4. you can collect stats on GTT.
Volatile Temporary tables (VTT) –
1. Table Definition is stored in System cache
2. Data is stored in spool space.
3. thats why, data and table definition both are active only upto session ends.
4. No collect stats for VTT.
77. What Is Filler Command In Teradata?
while using the mload of fastload if you dont want to load a particular filed in the datafile to the target then use this filler command to achieve this.
78. What Is The Command In Bteq To Check For Session Settings ?
The BTEQ .SHOW CONTROL command displays BTEQ settings.
79. How Do You Set The Session Mode Parameters In Bteq?
set session transaction ANSI /* this is to set ANSI mode */
set session transaction BTET /* this is to set Teradata transaction mode */
80. How Many Types Of Index Are Present In Teradata?
There are 5 different indices present in Teradata
1. Primary Index.
a.Unique primary index.
b. non Unique primary index.
2. Secondary Index.
a. Unique Secondary index.
b. non Unique Secondary index.
3. Partitioned Primary Index.
a. Case partition (ex. age, salary…).
b. range partition ( ex. date).
4. Join index.
a. Single table join index.
b. multiple table join index.
c. Sparse Join index ( constraint applied on join index in where clause).
5. Hash index.
81. What Does Sleep Function Does In Fast Load?
The SLEEP command specifies the amount minutes to wait before retrying to logon and establish all sessions. Sleep command can be used with all load utilities not only fastload. This situation can occur if all of the loader slots are used or if the number of requested sessions is not available. The default value is 6 minutes. If tenacity was set to 2 hours and sleep 10 minutes, Sleep command will try to logon for every 10 minutes up to 2 hours duration.