BDT
It is desired to apply the MapReduce model to the problem of determining the number of occurrences of words encountered in a text. Which of the following are valid outputs for the map and reduce steps (x represents an integer value strictly greater than 1)?
mapping: < word, 1 > reducer: < word, x >
mapping: < word, x > reducer: < word, list(x) >
mapping: < word, NULL > reducer: < word, x >
mapping: < x, word > reducer: < word, x >
What is the hierarchical clustering configuration of the following points?
((A(CD))B)
((C(AB))D)
((A(BC))D)
To initialize the number of clusters within K-means, the following values of the average silhouettes of the clusters are obtained:
k=2: 0.2, 0.5
k=3: 0.1, 0.6, 0.5
k=4: 0.3, 0.1, 0.4, 0.4
The optimal number of clusters is:
3
infinite
4
2
Let the following sequence be: 2 5 1 2 3 4 5 2 supplied as a stream of integer values. The values processed by the flow processor are: 2 5 1 2 1 2 3 4 3 4 5 2
To process the sequence it was used:
A tumbling window
A window that implies degradation
A sliding window and a sliding window
A sliding window
In the initial stages of running a MapReduce model, the coordinator node divides the input data into a number of disjoint subsets. It is recommended that this number of partitions be considerably larger than the number of workers assigned to a stage. Among the advantages of this approach are:
Reducing the complexity of the calculation stages characteristic of the fundamental stages of the model.
Reducing the complexity of algorithms involved in system load balancing stages.
Minimizing the times involved in the synchronization phases between the fundamental stages of the model.
Minimizing the times involved in error recovery scenarios.
Choose the correct statement about the weighted K-Nearest Neighbors algorithm:
The class of the new point is the weighted average of the distances between the point and its nearest k neighbors.
The greater the distance between a new point and one of its neighbors, the lower the probability that the point belongs to the neighbor's class.
The class of a new point depends on the classes of its neighbors in a region of radius equal to the weight of the point.
The greater the sum of the distances between a point and some of its neighbors, the greater the weight of the other neighbors.
For the determination of frequent itemsets based on a MapReduce application, the notion of a candidate itemset can be defined as:
Any subset, of any size, of a transaction analyzed in the mapping phase.
Any subset of a transaction analyzed in the mapping phase, composed exclusively of frequent items.
A superset of frequent itemsets included in transactions analyzed in the mapping phase.
Any subset composed of at least two elements of a transaction analyzed in the mapping phase.
Considering an adaptation of BFS for Dijkstra's algorithm, how many complete runs of MapReduce are required to completely determine all minimum-cost paths in a digraph?
The application is run until there are no more changes in the determined distances because the results become consistent when all possibilities have been analyzed.
<digraph diameter> runs are necessary because in this way we cover all possibilities.
The application is run until there are no more changes because Dijkstra is greedy.
The application can only be run once.
Choose the correct statement related to Expectation-Maximization clustering:
The expectation stage involves the calculation of the responsibility coefficients for all clusters.
Each point can belong to each cluster regardless of parameter values.
The maximization stage involves the weighted estimation of the parameters of the cluster distributions.
The smaller the standard deviations of the estimated cluster distributions, the higher the probability that the points belong to them.
Consider the following isolated transaction: {a, b, c} Considering a single run of a MapReduce solution for frequent itemsets, which of the following are possible intermediate results of the mapping step?
<{a,b,c},{a}> ; <{a,b,c},{a,b}> ; <{a,b,c},{a,b,c}> ; <{a,b,c},{a,c}>
<{a},1> ; <{a,b},1> ; <{a,b,c},1> ; <{a,c},1>
<{a},support+1> ; <{a,b},support+1> ; <{a,b,c},support+1> ; <{a,c},support+1>
<{a},NULL> ; <{a,b},NULL> ; <{a,b,c},NULL> ; <{a,c},NULL>
The sequential versions of the BFS/Dijkstra algorithms are based on centralized exploration queues. What can be said about these structures in the case of MapReduce?
They are not implemented because they are not compatible with the MapReduce model.
They are implemented at the coordinator node.
They are not implemented because they consume too much memory.
They are implemented at worker level.
Within the MapReduce model, tasks are reallocated to other workers:
Starting from the last known state of the task.
With resetting the task to its initial state.
By fully migrating the task starting from the current state.
By fully migrating the task starting from the last known state.
Let the following 2D point system be shown. Knowing that vector A is the first principal component, what is the second?
E
C
B
D
Data processing within a stream processor is mainly done by:
Sampling queries
Continuous Queries
Continuous sampling
Ad hoc queries
Considering the classical MapReduce model, the characteristic splitting of the mapping step is performed:
Correlating the size of the input data with the number of computers available in the working cluster.
So that each worker receives the same amount of work.
Correlating the input data size with the effective number of workers available.
Without correlating the size of the input data with the actual number of workers available.
What additional information can be considered when the coordinator node assigns work tasks to worker nodes in a MapReduce solution?
Information related to the availability of workers and their loading.
Information related to the actual location of data and access times.
Information related to the size of the input data and the storage capacity of workers.
Information related to data size and worker computing power.
A random variable is normally distributed with mean 0 and standard deviation 1. The probability that the variable takes the value 2 is:
0.2
0.054
0.066
0.01
The following sequence: ABBAABAA is compressed using LZ compression. At some point during compression, the dictionary contains: [1 A] [2 B] [3 BA] What is the next item in the dictionary?
[4 BAA]
[4 AB]
[4 ABA]
[4 BA]
What would be the total number of complete iterations of a MapReduce application required to implement BFS traversal of a digraph?
One complete MapReduce iteration.
Round digraph diameter, for weighted digraphs.
Round digraph diameter, for unweighted digraphs.
The diameter of the round digraph, if this value were known.
A classification model affected by overfitting:
Has classification errors too large for the training data.
Has a linear decision margin.
Has high accuracy in classifying test data.
Correctly classifies the training data but not the test data.
After determining the local support at the partition level, we obtain itemsets that do not satisfy the minimum support condition. In this case:
Itemsets are removed and all their supersets from the current partition are also removed.
Itemsets are removed and all their supersets are removed from all partitions.
The itemsets are retrieved in the next step to determine global support.
Itemsets are removed because they are not frequent.
In a hospital, the admission rate is estimated to be 1 patient/hour. Using the Poisson distribution, the probability that the actual rate is 2 patients/hour is:
0.03
0.36
0.01
0.18
Consider the graph. What is the calculated distance from node (3) to node (5) through the intermediate node (6) at the end of the second iteration of the BFS algorithm implemented by MapReduce?
4
2
3
undefined/infinite
Which of the following is NOT a valid set of Huffman codes?
011, 010, 10, 11
0011, 1101, 101, 010
011, 10, 01, 0001
100, 01, 001, 1010
The image shows the decision edge of a classifier that classifies the data into two classes (+ and x). The classification error is:
0.2
1
The assignment of tasks for the reduction stage of the MapReduce model is done so that:
The next reduction task is assigned to the next free worker.
Tasks must be assigned to the same worker who processed the mapping stage.
Workers with excessively large task queues are avoided.
Tasks are assigned to the worker with the fewest tasks.
The ACID data representation model:
Is not suitable for MapReduce because it involves atomic processing of data.
Is suitable for MapReduce because it involves isolated data processing.
Is suitable for MapReduce because it involves atomic transactions.
Is not suitable for MapReduce because it can condition synchronizations between stages.
Which of the following statements is true in the hypothesis that, within a MapReduce solution, the coordinator node fails?
The system cannot be restored and then the entire processing scenario is reset.
A new coordinator node is restarted, the system state being rebuilt based on the last known state information.
A new node with the role of coordinator is restarted, the system state being consistent only at the worker level.
A new node with the role of coordinator is restarted, the state of the system being reconstructed based on the information received from the nodes with the role of worker.
Choose the correct statement regarding stream processing:
The data is theoretically infinite in size.
Data processing in working memory takes a long time.
Working memory is generally large.
Data processing occurs after it is stored in working memory.
After determining the local support at the partition level, we obtain itemsets that do not satisfy the minimum support condition. In this case:
Itemsets are removed because they are not frequent.
Itemsets are removed and all their supersets from the current partition are also removed.
The itemsets are retrieved in the next step to determine global support.
Itemsets are removed and all their supersets are removed from all partitions.
Choose the correct statement about DBSCAN clustering (minP=3):
P1 is accessible from P4.
P3, P4, P5 are core points.
P1, P4, P6 are boundary points.
P1 is accessible from P6.
Let be a data set whose elements have two attributes A, B. The covariance matrix of the data set is:
[ 3 −3
−2 4 ]
What can be said about the relationship between the two attributes?
The values are not correlated.
If A decreases, B also decreases.
If A increases, B also increases.
Nothing, because the covariance matrix is wrong.
Choose the correct statement regarding stream processing:
Working memory is generally large.
Data processing takes place after it is stored in working memory.
Data processing in working memory takes long times.
The data is theoretically infinite in size.
The ACID data representation model:
Is not suitable for MapReduce because it can condition synchronizations between stages.
Is suitable for MapReduce because it involves atomic transactions.
Is not suitable for MapReduce because it involves atomic processing of data.
Is suitable for MapReduce because it involves isolated data processing.
Considering the classical MapReduce model, the characteristic splitting of the mapping step is performed:
So that each worker receives the same amount of work.
Correlating the size of the input data with the number of computers available in the working cluster.
Without correlating the size of the input data with the actual number of workers available.
Correlating the size of the input data with the actual number of workers available.
Which of the following is NOT a valid set of Huffman codes?
100, 01, 011, 1010
011, 010, 10, 11
0011, 1101, 101, 010
011, 10, 01, 0001
Within the MapReduce model, tasks are reallocated to other workers:
By fully migrating the task from the current state.
Starting from the last known state of the task.
By fully migrating the task from the last known state.
With resetting the task to its initial state.
In the initial stages of running a MapReduce model, the coordinator node divides the input data into a number of disjoint subsets. It is recommended that this number of partitions be considerably larger than the number of workers assigned to a stage. Among the advantages are:
Reducing calculation complexity.
Reducing balancing algorithm complexity.
Minimizing synchronization times.
Minimizing recovery times in error scenarios.
For a variable x∈{0,1,2,3,4,5}, the empirically estimated probability density function has values: [0.1, 0.3, 0.2, 0.1, 0.1, 0] What is P(x<3) ?
1
0.6
0
0.4
Consider the following covariance matrix
[ 2 0
0 1 ]
Which of the following is a principal component of the data set?
[0 0]
[1 2]
[2 1]
Let there be a partition of m transactions, each transaction having n items. The number of possible itemsets raises to the power of:
n^m
m*2^n
n*m^2
s^(n*m)
Consider the following sequence of integers and a stream processor. The stream processor computes: s(i)=s(i−1)+x(i)−x(i−4) What type of window is used?
Tumbling window
Sliding window
Paging window
Sampling window
{"name":"BDT", "url":"https://www.quiz-maker.com/QMH0AY6BI","txt":"It is desired to apply the MapReduce model to the problem of determining the number of occurrences of words encountered in a text. Which of the following are valid outputs for the map and reduce steps (x represents an integer value strictly greater than 1)?, What is the hierarchical clustering configuration of the following points?, To initialize the number of clusters within K-means, the following values of the average silhouettes of the clusters are obtained: k=2: 0.2, 0.5 k=3: 0.1, 0.6, 0.5 k=4: 0.3, 0.1, 0.4, 0.4 The optimal number of clusters is:","img":"https://cloud.quiz-maker.com/uploads/126/5890557-Screenshot-from-2026-06-06-14-19-23.png"}