A comprehensive overview on the foundations of formal concept analysis

The immersion of voluminous collection of data is inevitable almost everywhere. The invention of mathematical models to analyse the patterns and trends of the data is an emerging necessity to extract and predict useful information in any Knowledge Discovery from Data (KDD) process. The Formal Concept Analysis (FCA) is an efficient mathematical model used in the process of KDD which is specially designed to portray the structure of the data in a context and depict the underlying patterns and hierarchies in it. Due to the huge increase in the application of FCA in various fields, the number of research and review articles on FCA has raised to a large extent. This review differs from the existing ones in presenting the comprehensive survey on the fundamentals of FCA in a compact and crisp manner to benefit the beginners and its focuses on the scalability issues in FCA. Further, we present the generic anatomy of FCA apart from its origin and growth at a primary level.

professional bodies including ISC, CSI, ISTE, IACSIT.He is reviewer for many reputed international journals and conferences.He is editorial board member for several international journals.

Introduction
The developments of information technologies and network have produced huge collection of data every year from different trades.The data flows from various fields such as information technology, agriculture, medicine, finance, markets, social science, demography, etc.This data has no direct information and it is concealed in the data.Extracting the useful information from the huge data is known as knowledge discovery and is an important task in any knowledge based system.According to Han and Kamber (2006), knowledge discovery is to discover the rules and patterns that exist in the data by which one can foretell the trends of the future in the system.So, the invention of methods and means to automatically analyse the patterns and trends of the data is an emerging necessity in order to extract and predict useful information to the society (Malzahn, Ziebarth, & Hoppe, 2013;Mattingly, Rice, & Berge, 2012;Zushi, Miyazaki, & Norizuki, 2012).This is an important issue and apparently has high priority.
To this end, several researchers have proposed various models and techniques (Huang, Yang, Chen, & Wu, 2012).Among such models, mathematical models have contributed enormously to understand the KDD (Knowledge Discovery from Data) precisely.Some of such models are: Set theory, Rough set theory, Fuzzy set theory, Probabilistic set theory, Intuitionistic set theory, Soft set theory, etc.Along these mathematical models falls the lattice theory based notion of Formal Concept Analysis (FCA) (Wille, 1982).FCA concentrates mainly on the clustering of certain objects and attributes which are termed as concepts by which the functionality of cluster analysis from knowledge discovery point of view is carried out.Under the poset relation the concepts can be presented in a form of lattice due to which the functionalities such as presentation and prediction of information can be carried out.The functionality of determining associations can be achieved by finding the implications for the given context using FCA.Thus, FCA based techniques in the practices of knowledge discovery process yield fruitful results to the users.
The extraction of knowledge using FCA from any database is of three dimensions viz., conceptual clusters, lattices (graphical representation) and association rules.Concepts express the underlying relationships between objects and attributes in the context; a concept lattice portrays the context graphically; and the association rules discover the underlying associations within the attributes of the context.
Although FCA is an important formalism for knowledge representation, extraction and analysis, one of the major issues in FCA is the issue of scalability arising due to the size of the contexts which yield bigger concept lattices.As the size of the concept lattice increases, the visualisation of concepts along with hierarchy becomes complex and impractical.This complexity issue arises due to the scalability of FCA and its extensions in various environments.According to Poelmans, Ignatov, Kuznetsov, and Dedene (2013) the scalability issue is focused in 9% of the articles on FCA.We also point out the scalability issue in FCA and review it.
In view of the growing and applicative nature of FCA in various fields, we present its fundamental notions with examples in this article as follows.In section 2, the origin and growth of FCA is discussed.The terms and notions related to FCA are presented and illustrated in section 3. Section 4 deals with the scalability issues and the current trends on it.At last, we conclude the article in section 5.

Origin and growth of FCA
The lattice theory based framework namely Formal concept analysis (FCA) has emerged as a distinctive tool in the field of knowledge discovery.FCA has found its immense growth since its inception in to the field of data analysis and knowledge representation few decades ago.FCA is a theory of mathematics means for determining the concepts and their hierarchies that underlie in any information system (Wille, 1982).The mathematical foundations of FCA were first laid by (Birkhoff, 1948) who first bridged the partial orders and lattices.Further he also proved that any binary relation between a set of objects and a set of attributes can be depicted by means of a unique lattice which provides an insight into the structure of the original relation.FCA has emerged as a result of the attempts of a group of researchers to develop the applications of lattice theory at Darmstadt University of Technology in Germany.The research group was led by the Professor Rudolf Wille, who became the founder of FCA by publishing his first article on FCA in 1982 (Wille, 1982) in which he discussed about the approach of restructuring lattices using hierarchies of concepts.
The mathematical theory of FCA has been extended into various frontiers and included with other knowledge representation schemes.Recently Singh, Kumar, and Gani (2016) provided necessary mathematical background for few extensions of FCA for various environments such as FCA with granular computing (rough set theory), fuzzy set theory, interval-valued set theory, possibility theory, triadic concepts, factor concepts and handling incomplete data.Yao (2016) interpreted the notion of rough set (RS) definable concepts and thereafter derived the Boolean algebra for RS-definable concepts.RSdefinable concept is a pair of extension and intension, with extension being a set of objects and the intension being a family of sets of attribute-value pairs.

Terms and notations in FCA
FCA is an art of describing the world in terms of the objects and attributes possessed by those objects.In FCA the adjective 'formal' is often used to emphasise the mathematisation of the notions used from those of the human mind.The terms and notions used in this article are on the basis of the text book of Ganter and Wille (1999) and also consistent with the notions dealt by Davey and Priestly (2002).
FCA is the theory of formalisation of the idea concept.The notion of concept has been already suggested from ancient times by the eminent philosophers such as Plato, Francis Bacon and John Stuart Mill in order to characterize formal logic systems.The notion of a concept from a context was first studied in (Arnauld & Nicole, 1981) and the term has been recognized in the German standard (DIN 2330(DIN , 1993)).The philosophical thought of the notion 'concept' can be described by its extensionthat is the set of all objects belonging to the concept and its intensionthe set of all attributes possessed by those objects in common.
For example, consider the object-attribute relation: 'All living beings need water to live.'This relation obviously forms a concept since it has an extent and the corresponding intent.The extent is the set of objects of all living beings including mankind, animals, birds, etc., and the intent is the attribute 'water'.The relationship covering the set of objects and a set of attributes is often represented by means of a formal context which is formally defined as below.

Formal context
A formal context is a triplet K:=(G, M, I) where G denotes a set of formal objects, M denotes a set of attributes and I G M  is the incidence relation between the objects G and attributes M. The symbols G and M stand for the German words Gegenstande (Objects) and Merkmalle (Attributes) respectively.For any two elements gG  and mM  , the binary relation ( , )  g m I  has to be read as "Object g has the attribute m" and is usually written as gIm.
A formal context is often represented using a cross-table in which rows correspond to the set of objects G while the columns correspond to the attributes M. The presence/absence of the incidence relation between G and M are denoted by the presence/absence of crosses.Such contexts with yes or no attribute values are known as binary contexts or one-valued contexts (possible number of attribute values in case of its presence).In order to explain the further notions of FCA, its convenient for us to consider a small formal context.To this end, we look at the context of Wolff (1994) on animals and their characteristics shown in Table 1.
In this example, the object set G consists of the animals-Lion, Finch, Eagle, Hare and Ostrich while the attribute set M includes the characteristics-preying, flying, bird and mammal.The symbol  at the intersection of an object row and attribute column points out that the object possesses that attribute.For example, the animal Lion has the attributes preying and mammal in the given context K.
The terms context, formal context and cross-table are synonymous and henceforth are used interchangeably throughout the article.Furthermore, since we only deal with the mathematisation of the notions context and concept throughout the article, we don't emphasise the use of the prefix 'formal' with them.Before formally defining a formal concept in a context, we first need to know about the concept-forming operators or ↑ (up), ↓ (down)/up-down operators on any given context K.
For example, in the given cross-table shown in Table 1: Clearly, every context induces the concept-forming operators.We notice that the operator ↑ assigns the subsets of G to subsets of M and dually the operator ↓ assigns the subsets of M to subsets of G.For brevity, the concept-forming (up-down) operators A  and B  are denoted as A and B .

Formal concepts
We next define the notion of a formal concept (Ganter & Wille, 1999).The formal concepts are the clusters of the given context formed as a result of attribute sharing.More formally, For any given context K:=(G, M, I), a formal concept is a duple (A, B) where AG  and BM  such that AB   and BA   .Plainly, in a given context, if A is a maximal set of objects sharing a maximal set of attributes B then the ordered duple (A, B) is called as a formal concept.The sets A and B are respectively known as the extent and the intent of the concept (A, B).The following proposition on extent and intent follows directly.
Proposition 1 (Ganter & Wille, 1999): Let K:=(G, M, I) be a context.For any subsets AG  and BM  the following are valid: AA   and BB   .Consequently, ( , ) AA   and ( , ) BB   are valid concepts of the context K. Further, A is an extent if and only if AA   and dually B is an intent if and only if BB   .Thus, combining the definition and properties of a concept we can tell that for any concept (A, B), ( , ) ( , ) ( , )

Properties of concepts
It is noteworthy to mention that a set S is said to be maximal/minimal set with property P if there exists no other proper superset/subset of S with property P. Furthermore, S is said to be a maximum/minimum set if its cardinality is maximum/minimum among such sets.
A rectangle in a context K:=(G, M, I) is a duple (A, B) such that the cartesian product A B I .i.e., for every xA  and yB  , ( , )   x y I  .For any two rectangles 11 AB  and 22 AB  we call 1 1 if and only if 12 AA  and 12 BB  .Any formal concept (A, B) can also be viewed as a maximal rectangle in the context.The formal concepts remain invariant under the row or column permutations of the cross-table.

Computation of concepts
The formal concepts can be easily computed for any given context K:=(G, M, I).Though there may be several techniques to compute the formal concepts, the easiest way is to start with an object gG  and determine its attribute-set BM  , the intent of the concept.Next, determine the set of all objects AG  which possess all the attributes in B (intents) which form the extent of the concept.Thus, the ordered pair (A, B) is the required concept.Dual approach of starting with any attribute mM  can also be adopted in the determination process of concepts.More generally, for any subset of objects or attributes of a context the corresponding concept can be determined.
A concept ({ } ,{ } )  gg   obtained by starting with an object gG  is called as an object concept denoted by () g  .Dually, a concept ({ } ,{ } ) mm   obtained with the start of an attribute mM  is called as an attribute concept denoted by () m  .Clearly, not all concepts of a context are object or attribute concepts.Any concept may be either object concept or attribute concept or both or neither.
We shall illustrate the above concept determination process through the context given in cross-table (Wolff, 1994).Let us start with the object Lion, its intent set is B = {preying, mammal}.The extent set corresponding to B is A = {Lion} only.So, the pair ({Lion}, {preying, mammal}) is a concept for the given context.If we start with the objects Finch, Ostrich we obtain the intent set B = {bird} whose extent set is A = {Finch, Eagle, Ostrich}.So, the ordered pair ({Finch, Eagle, Ostrich}, {bird}) is another concept in the given context.Exploration of the given context further yields the following 8 concepts.concepts 2, 3, 4, 6, 7 are object concepts and concepts 4, 5, 6, 7 are attribute concepts.One can easily note that only certain and not all of the subsets of objects have formed as extents of some concepts and the case of intents also similar, though up-down operation exists for any such subsets.For any given subset of objects/attributes the resulting concept is always unique.Moreover, if the extent A of any concept (A, B) is known then its intent B can be uniquely determined and vice versa.
There are several algorithms to generate the formal concepts in a context.Some of the famous algorithms serving this purpose are: Ganter, Bordat, Next neighbours, etc. Kumar and Singh (2014) have studied the performance of various concept generation algorithms.

Hierarchy of concepts
In order to discuss about the properties of the set of all concepts we require to know the fundamental notions associated with the lattices from set theory, a branch of mathematics.We refer the readers to Davey and Priestly's Lattices and Order (Davey & Priestley, 2002) for an introductory knowledge on lattices and to George Gratzer's General Lattice Theory (Gratzer, 2003) for an encyclopaedic knowledge on lattices.In order to make the article self-content we recall some of the basics of lattice theory.
Let P be any set in which any two elements , xy are related using some relation R denoted as xRy .Then P is said to be a partially ordered set or simply a poset if the following properties hold: i.
Anti-symmetry : If , x y P  such that xRy and yRx , then xy  .iii.
Transitivity : If ,, x y z P  such that xRy and yRz , then xRz .
The relation R by which a set P is a partially ordered set resembles the usual relation of  (less than or equal to) in view of the above stated three properties.Hence, conventionally, the symbol R is replaced by the symbol  .A set P with a partial order  is denoted by ( , )  P  .
Let ( , )  P  be a partially ordered set and let S be its subset ( SP  ).An upper bound of S is an element xS  such that sx  for all sS  .Dually, a lower bound of S is an element yS  such that ys  for all sS  .A smallest element amongst the set of all upper bounds of S is called the supremum or least upper bound of S and is denoted by S  .Dually, the greatest element amongst the lower bounds of S is called the infimum or greatest lower bound of S and is denoted by , we write simply xy  instead of S  and xy  instead of S  .The terms supremum and infimum are also referred to as join and meet respectively.Consider a partially ordered set ( , )  P  .
➢ If for any two elements , x y P  , xy  and xy  exist, then ( , ) P  is called a lattice.➢ If for any subset SP  , S  and S  exist, then ( , )  P  is called a complete lattice.
Turning back to the discussion of concepts of a context, any two concepts 11 ( , ) AB and 22 ( , ) AB of a context can be ordered/related by means of the subconceptsuperconcept ordering relation  , which is defined as follows: if and only if 12 AA  which otherwise also means that 21 BB  .
The ordering relation  between concepts can be identified to be a partial order.In other words, the relation  satisfies the three properties of set theory viz., reflexivity, anti-symmetricity and transitivity.The partial order  between the elements of a poset is also known as the hierarchical order or lexicographical order.The definition of the object and attribute concepts achieves the following straight forward result.
Proposition 2 (Davey & Priestley, 2002;Lambrechts, 2012):  Let B(K) be the set of all concepts of a context K:= (G, M, I).Then, (B(K),  ) is a partially ordered set.Moreover, for any subset of concepts in B(K), there always exists supremum as well as infimum and hence the poset (B(K),  ) forms a complete lattice.The complete lattice (B(K),  ) is often known as a concept lattice for obvious reason.

Concept lattices
The symbol B for concept lattices is attributed to the mathematician Birkhoff who initiated the theory of formal concept by proving the existence of lattices for binary relations of any context in his Lattice Theory (Birkhoff, 1948).A detailed study about concept lattices and their theoretical aspects can be found from (Sarmah, Hazarika, & Sinha, 2015).
The one of the main reasons for considering FCA as a powerful method in the analysis of data is due to the fact that it has the added feature of graphical visualisation of the context which explores the underlying implicit relationships in the given context.

Graphical representation of concept lattices
Any lattice can be graphically viewed using Hasse (line) diagrams (Davey & Priestley, 2002) and so also the concept lattices.The Hasse diagram of a lattice can be easily drawn as follows.
Represent the elements of a lattice ( , )  P  by means of nodes/circles.Let , x y P  be any two elements.Then join the nodes corresponding to the elements , x y P  if and only if xy  and there exists no other element zP  such that x z y .Or simply, if , x y P  are the immediate predecessor and successor (sub-concept and super-concept) respectively, then join their nodes by a line.Another convention adopted in the drawing of Hasse diagrams is that if , x y P  such that xy  , then the node corresponding to x is placed below that of y.It is interesting to note that the Hasse diagrams of a lattice need not be unique in the sense that there can be different drawings for the same lattice, since nodes can be placed as desired.However, any two Hasse diagrams of a lattice are always isomorphic graphs.Isomorphic graphs are the different drawings of the same graph.Having understood the Hasse diagrams of lattices, let us now illustrate the graphical representation of the concept lattices.
Hasse diagrams endow us to view every concept lattice graphically easily than any other representation scheme.The only part that remains for us to know is the labelling of the concepts in a concept lattice.Obviously, every concept in a concept lattice is attributed to a node/circle in a Hasse diagram.Labelling each concept over the nodes would be overkill.As an alternative method, 'Reduced Labelling' scheme is available to this end by which the concepts of a concept lattice are labelled as follows:

•
A node corresponding to an object concept () g  is labelled by the object gG  .
• A node corresponding to an attribute concept () m  is labelled by the attribute mM  .

•
Object labels are written below the nodes while attribute labels above the nodes.
The remaining concepts can be retrieved using the proposition 2 stated earlier by understanding their extents and intents properly.In a concept lattice, one can determine the extent of a concept node by collecting all the object labels of the nodes that can be reached starting from the corresponding concept node by descending/downward path including the object label of the starting node if it has one.Similarly, starting from a concept node, the collection of attribute labels of the nodes which can be reached by ascending/upward path including the attribute label of the starting node if it is an attribute concept node yields the intent of the starting concept node.
Since any lattice diagram is always Hasse diagram only, we need not emphasise the term 'Hasse' and henceforth we omit it from the discussion.As we move through the nodes of a concept lattice from the bottom/top to the top/bottom, we find that the object set increasing/decreasing and attribute set decreasing/increasing respectively.Thus, the predecessor concepts inherit the objects from their successors while the successors inherit the attributes from their predecessors.Briefly, as we traverse from bottom to top we achieve more general concepts and the reverse traversal achieves more specific concepts.The top most concept which consists of all the objects is called as the unit concept and dually the bottom most concept consisting of all attributes is called as the empty concept.The set of concepts lying on the downward path is known as the down-set or order ideal and dually that on the upward path is known as the up-set or order filter.The concept lattice reflects the relationship of generalization and specialization among concepts.It thereby is more intuitional and effective for knowledge representation and knowledge discovery.
Using the principles of the partial order of concepts and Hasse diagram we are now able to draw the concept lattice of the context given in Table 1 as shown in Fig. 1.

Fig. 1. Concept lattice for the formal context of Table 1
The given context Table 1 as explained earlier has eight concepts; each of them is represented by means of a node in the concept lattice.Any two immediate predecessors / successors are joined directly which on the whole yields the concept lattice as desired.The concepts corresponding to the nodes can be identified as interpreted earlier the example the node with object label FINCH corresponds to the 6 th concept ({Finch, Eagle}, {flying, bird}).[Recall that extent is the collection of objects from downward paths while the intent is that of attributes from upward paths].

Many valued contexts and their scaling processes
Having understood the fundamental notions of FCA, we will now try to explain the FCA structures for varieties of information contexts.In general, FCA is not compatible with all types of information contexts.In such circumstances, the information context is modified using appropriate principles so that it becomes compatible to be processed by FCA.
Usually, attributes are considered to be one-valued viz., 'yes'.But in several contexts, attributes are identified with many values.For example, the attributes such as weight, colour, grade, etc., may be characterized as low, medium, high, etc.Such contexts are known as many-valued contexts.In such cases, the usual context representation scheme is not suitable for the analysis using FCA and the context is modified to a one-valued context using methods of 'conceptual scaling' (Ganter & Wille, 1989;Davey & Priestley, 2002).The modified one-valued context is known as the derived context.
In the process of scaling, a many-valued context is first transformed into a onevalued context or binary context using conceptual scaling techniques.However, this transformation process is accomplished by the users.Hence, the conceptual scaling of a many-valued (MV) context is not determined uniquely.
Literature lists several research articles centered on MV contexts.Messai, Devignes, Napoli, and Smaïl-Tabbone (2008) have studied MV contexts for the first time.They observed that MV contexts yield multi-level concept lattices of have higher precision levels.In the retrieval process of valid information from complex queries, use of MV context methods brings out fruitful results.Before we proceed, let us formally define a MV context.A many-valued (MV) context ( , , , )  G M W I comprises of sets of objects G, attributes M, attribute values W together with a ternary relation I between G and M, W.

Stated otherwise, I G M W
   such that ( , , )   g m w I  and ( , , ) g m v I  imply wv  .The notation ( , , ) g m w I  , means that 'for the object g, the attribute m possesses the value w'.If W contains n elements, then the quadruple ( , , , ) G M W I is called an n -valued context.Every MV attribute is a partial map : m G W  such that () m g w  .For any attribute m, its domain is defined as then the attribute m is said to be complete.
Concept lattices cannot be determined instantly for many valued contexts.In this case, one has to convert it into a binary valued context which is termed as conceptual scaling according to (Ganter & Wille, 1989).Such a modified context is known as the derived context.Normally, a conceptual scale is employed on a single attribute m, and in this case the scale forms a basis for the formal context.The standard scaling method namely plain scaling creates from a scaled MV context (( , , , ),( ) which is an ordered pair that consists a many valued context We will now require an example of a MV context to interpret the forthcoming notions clearly.Let us consider a simple context of platonian bodies given by (Hitzler & Scharfe, 2016) shown in Table 2.

Conceptual scaling
We will now discuss about the process of conceptual scaling.Every attribute of a MV context is first interpreted using a context.This context is known as conceptual scale.Theoretically, a scale for an attribute can be defined as follows.
The 'scale' of an attribute m in a many-valued context is a one-valued context : ( , , ) where m GG  .The objects and attributes in a scale are respectively known as scale values and scale attributes.A scale for an attribute is a context which serves in the process of transformation of a many-valued context into a binary context.For example, in the given example of platonian bodies we can classify the attribute facets into simple, medium and complex using the following scale shown in Table 3. Conceptual scales interpret the columns of a MV context.Conventionally, the contexts which are binary and are clear in structure are called as scales, even though every context can be regarded as a scale.The simplest of all conceptual scales are the nominal scales in which every attribute is subdivided by each of its values.Using the nominal scale in the context given in Table 2, the attributes corners, edges and facets are respectively subdivided into 5, 3 and 5 columns in the derived context.The derived context out of the nominal scale is shown in Table 4 which is followed by its concept lattice Fig. 2.

Table 4
Formal context derived from Table 2

Corners
Edges Facets The other class of conceptual scales is the ordinal scales and its variety is several.To mention few, we will glance at some basic ordinal scales viz., one-dimensional ordinal scale, inter ordinal scale, biordinal scale and dichotomic scale.
In a one-dimensional ordinal scale, the attribute values of every attribute are ordered such that some attribute values subsume other values because the former attribute values are greater or lesser than those of later ones.As a result, the extents form a chain of hierarchy.For example, the attribute values may be arranged in the order {good, better, best}.The following Table 5 is an example of one-dimensional ordinal scaling in the example context under consideration in Table 2 which is followed by its concept lattice in Fig. 3.

Fig. 2. Concept lattice for the formal context of Table 4
Table 5 Formal context derived from Table 2

Corners
Edges Facets 'Interordinal scales' are used in the representation of contexts of mixed attribute values.For example, the contexts such as the answers of a questionnaire contain bipolar attributes which are mixed, and it can be efficiently scaled using interordinal scales.For instance the attribute values { 1  , 2, 3  , 1  , 2  , 3  } yield extents which fall on the attribute interval values.Another example for application of biordinal scales can be in a marking scheme having values {poor, middle class, rich, very rich}, in which the attribute 'rich' can belong to both attributes 'middle class' and 'very rich'.
The 'dichotomic scale' context of binary attributes contain the values of the kind {yes, no} shown in the following Table 6.
Having understood various real-life contexts and their scales, one may now be able to construct concept lattices for any given context.Apart from the benefit of understanding the contexts by concepts and their graphical view of line diagrams FCA also empowers the users to explore the hidden rule patterns present in the formal context and we present some fundamental aspects of the same subsequently.

Attribute implications
The quest of understanding dependencies between attributes leads to the study of attribute exploration in contexts.The attribute logic is the underlying rules between the sets of attributes in a context.Attribute implications portray the data dependencies.For example, the following are some attribute implications.

•
Every number divisible by 2 and 5 is also divisible by 10.

•
Every patient with symptoms head ache and fever also gets vomiting symptom.
From the attribute hierarchy of concept lattices, we infer that in any intent, the attributes always occur along with those above them.This mathematical property of lattices paves the way to another broad area of knowledge discovery in FCA viz., 'Attribute Exploration'.Let us explore some of its associated basics.
According to mathematical logic, an implication XY  is a logical statement that relates a set of formulas X with another set of formulas Y such that Y is a logical consequence of X.The implication XY  literally means, 'if X then Y' and hence can be thought of as 'if-then' dependencies between attributes.In this angle, the implication formulas are also known as 'functional / attribute dependency (AD) formulas' or association rules.In view this, the study of rules or implications viz., Attribute Exploration in FCA is also referred to as 'association rule mining'.
In the treatment of formal contexts, for any two attribute subsets , X Y M  of a context (G, M, I), an implication of the form XY  means that the set of objects possessing all attributes in X also possess all attributes in Y.The attribute sets X, Y are respectively are referred to as 'premise' /'antecedent' and 'conclusion' /'consequent'.Some contexts often contain huge set of objects versus relatively small set of attributes and hence deriving all the concepts would be overkill.In such cases, the concept lattices can be conveniently inferred from attribute logic.Sometimes, attribute exploration is the only alternative knowledge discovery technique instead of concept exploration to handle several complexities of FCA.For example, a context may be huge or even infinite in size.Sometimes, contexts may be with 'unknown objects'.Therefore, it may not be possible to explore the entire set of formal concepts and thereby cannot obtain the corresponding concept lattice with entity.In such cases, the use of AD formulas helps us to determine the 'typical' set of objects or attributes (with common properties) of the context.Some authors have derived the typical set of objects from such contexts by the use of 'domain expert / background knowledge' (Belohlavek & Vychodil, 2009;Belohlavek & Macko, 2011;Dias & Vieira, 2010;Burmeister, 2003;Ganter, 1999;Groh & Eklund, 1999;Sumangali & Kumar, 2014).We next illustrate the attribute exploration in FCA.
Let us consider the following simple context K=(D60, D60, /) shown in Table 7 where D60 is the set of divisors of number 60 and / is the relation division.

Table 7
Formal context of divisors of number 60 D60 1 2 3 4 5 6 10 12 15 20 30 60 By observing the above context, one can easily infer the existence of the , since all the objects (numbers) having the attributes (divisors) '  1, 2, 3 ' also have the attribute (divisor) '   6 '.In this case, the converse of the implication viz,     6 2, 3  also holds.Note that since the divisor 1 is present with all objects it is a redundant attribute and hence can be ignored.Not all the converse implications are valid.For example, the converse of the implication     1, 3, 5 15  does not hold.Perhaps all the implications of the divisors context presented below are easy to understand because of the logical division relation which is familiar to us.But in general, it is not always possible to examine the validity of implication formulas directly by observing the context.To this end, the following proposition helps us to verify the validity of implications.
. Furthermore, it is directly valid in the set of all intents the formal concepts B(G, M, I).
In the given context shown in Table 7 consider the possibility of the implication Let us validate this implication using the above proposition.
Hence the above proposition holds good.Similarly, one can verify the validity of the following propositions.The use of DG basis (discussed subsequently) yields the following set of implications to the divisors context for 60 as shown in Table 8.
From the perspective of data mining, a formal context (G, M, I) is replaced by (T, I, R) whose symbols stand for Transactions (Objects), Itemsets (Attributes), and Relations (Incidence Relation) respectively.Any subset of k attributes is called as a 'k-itemset'.An 'intent' is referred to as a 'closed itemset'.The detailed discussions on discovery of association rules in data mining can be found in (Agrawal, Imielinski, & Swami, 1993;Agrawal & Srikant, 1994).The following measures are often used in the mining of association rules.

The support of an itemset
An itemset is said to be a frequent itemset if its support is greater than or equal to some user specified threshold value.For any implication/ rule, XY  , its degree of association is measured using support and confidence measures which are defined as follows, Support: basis whose confidence levels are 100% and < 100% respectively (Stumme, 2002;Zhang & Wu, 2011).Implications obey Armstrong rules namely, A DG basis is a minimal subset of implications/rules which can derive all implications with Armstrong rules.The main advantage of DG base of attribute implications is that, it produces a minimal possible number of implications among all other bases of implications, which hold in context.In our article, we treat with the implications derived out of DG basis.
Though the determination of all the implications of a context may seem to be an easy task, it is not so in general due to huge size of the context and sometimes implications also.To this end, for any formal context, its concept lattice and the set of implications can be produced by the use of software tools.One such software tool developed by Dr. Serhiy Yevtushenko is given in (Yevtushenko, 2000).
In the next section we discuss about the scalability issues in FCA and briefly review some of the articles with this interest.

Scalability issue in FCA and its improvements
Though, FCA is considered as an important formalism to represent, extract and analyse any information system, it faces few problems which are to be addressed.Contexts are in general huge, complicated and contain much redundant knowledge.So, a main problem identified in practical applications of FCA is that the computational cost in processing the information system with FCA is high and the visualisation of lattice structure is difficult to perceive.This complexity issue arises due to the scalability of FCA.
The number of formal concepts grows exponentially to the size of the context and it is found to be computationally #P-complete (Kuznetsov, 2001).In addition, the number of implications grows exponentially, as attribute size increases in formal contexts and it is computationally #P-hard (Kuznetsov, 2004).In ICFCA 2006 (International Conference on FCA) handling large context was discussed as an open problem.After this conference several researchers concentrated on the scalability issues in FCA.
Literature, describes variety of approaches to control the complexity and size of contexts, concepts, concept lattices and rules.Popular research methods for improving scalability of FCA often involve: conceptual scaling for many-valued contexts, matrix decompositions, iceberg concept lattices, clustering approach, computing granular concepts, concept similarity indices, objective functions, attribute reduction, other filtration strategies, etc.
Recently, Dias and Vieira (2015) have classified concept lattice reduction techniques into three classes.In the first class of reduction techniques the redundant information is removed from the context and thereby a minimal concept lattice is obtained.This class of techniques is useful when the context has much redundant knowledge.The second class of reduction methods is the simplification of contexts/concept lattices.This class of techniques is useful to identify very important aspects in a context/concept lattices.Finally, third class of reduction techniques is the selection of formal concepts, objects/attributes.When the context possesses some standard applicable principles this class of reduction techniques is more useful to obtain meaningful information.
We next summarize some of the improvements in literature on the scalability issues under the stated three categories in Table 9 as shown below and describe the contribution of each work briefly.

Table 9
Some important contributions on FCA scalability issues

Paper
Redundant Information Removal/Context Pre-processing Ganter & Wille (1999) Authors obtained the clarified context by removing reducible objects and attributes, and the resulting concept lattice preserves the isomorphism with the original one.Wu, Leung, & Mi (2009) Granular structure of concept lattices with application in knowledge reduction in formal concept analysis is examined in this paper.Information granules and their properties in a formal context are first discussed.Concepts of a granular consistent set and a granular reducts in the formal context are then introduced.Wei & Qi (2010) The relation between the reduction methods using concept lattices and rough sets was discussed based on classical formal context.The method unravels the relation research between these two theories.
Pei & Mi (2011) Authors have reduced the attributes in a decision formal context based on a homomorphism consistent set from the concept lattice.Medina (2012) Attribute reduction in the three frameworks namely formal, objectoriented, and property-oriented concept lattices were studied in this article.Irrespective of the frameworks, it has been found that the attributes can be classified into three levels of necessity and in any level the attribute reducts are identical.
Li, Mei, Kumar, & Zhang (2013) The author has proposed a framework for knowledge reduction from decision formal context using the idea of rule acquisition to discover a new set of non-redundant decision rules.Li, Mei, & Lv (2013) This article concentrates some of the issues in incomplete decision contexts such as approximate concept construction, rule acquisition and knowledge reduction.A method is proposed to build an approximate concept lattice with an incomplete context.The notion of an approximate decision rule is defined, and a method is developed to extract non-redundant approximate decision rules from an incomplete decision context.These rules are again reduced by constructing a discernibility matrix and its associated Boolean function.
Li & Wang (2016) This paper deals with knowledge discovery in incomplete contexts.It concentrates on two issues namely concept determination with three-way decisions and attribute reduction with incomplete contexts.The notions of acceptance, rejections and noncommitment are used in the formulation of 3-way decisions.Xu & Li (2016) The important task of granular computing (GrC) is to represent, construct, and process information granules.The authors propose a novel GrC method using FCA description of information granules.This method organizes arbitrary fuzzy information granules to become necessary and sufficient fuzzy information granules.The method is presented along with an algorithm.Qian, Wei, & Qi (2017) In this paper, a three-way concept lattice of a given formal context is proposed.Type-I and Type-II combinatorial contexts are constructed with original and complementary formal contexts.From these two contexts, three-way concepts are constructed by two-way operators.And then the relationships between three-way concept lattices and classical concept lattices are achieved.Authors have studied the behaviour of concept lattices which are reduced using SVD matrix and NMF decomposition techniques.They also have focused on rule reduction after the context compression.

Dias & Vieira (2010)
Junction based on objects similarity (JBOS) uses the background knowledge in order to replace similar objects by representative elements using certain degree of similarity.
Kumar & Srinivas (2010) Kumar (2012) Reduced the size of the concept lattices using fuzzy k-means clustering (FKM).Context matrix is reduced, and quotient lattices are obtained using equivalence relations derived by means of FKM Clustering.In which each record can belong to more than one cluster, and a set of membership levels is associated with each element.
The same technique has been adopted in association rule mining of concept lattices in (Kumar, 2012) from the healthcare item set.Kauer & Krupka (2014) The reduction of incidence relations from the formal context also controls the complexity of the concept lattice.Kumar, Dias, & Vieira (2015) Compressed the original context based on non-negative matrix factorization (NMF).Context matrix is decomposed using NMF and formal context is obtained using threshold value.The non-negative constraint suits the context better as attributes values are always non-negative NMF permits only additive combinations but not subtractive combinations of the original vectors.Li, Shao, & Wu (2017) The authors introduced the three-way decision theory viz., acceptance, rejection, non-commitment in FCA recently.An axiomatic approach is proposed to generalize the three-way concepts learning through granular computing.The authors have studied concept lattices under fuzzy environments.They analysed the fuzziness in a many-valued context which is transformed into a fuzzy formal contexts and fuzzy formal concepts.
They have reduced the number of fuzzy formal concepts by simplifying the corresponding fuzzy concept lattice structure.An algorithm is also presented for the method.
They also have introduced the notion of bipolar fuzzy setting in FCA.They have devised a method for investigating the bipolar fuzzy formal concepts.They also produced lattice representation using bipolar fuzzy graph.

Paper
Selection/Concept filtration Stumme (2002) Large databases can be analysed using iceberg lattices introduced by Stumme (2002) which uses the variant 'support'.The main drawback of this approach is that the iceberg concept lattice only denotes the most upper part of the concept lattice.As such, it may not be an extraction of all the concepts of the large context.Belohlávek, Sklenar, & Zacpal (2004) Authors proposed a method that reduces the number of concepts using certain constraints, which are derived from attribute dependency formulas (ADF) that are additionally inputted along with the formal context.The set of concepts, which are compatible with the given set of ADFs, are reduced as important concepts.Authors reduced the dimensionality of the concept lattice using the equivalence classes of objects in the process of information retrieval in which the matrix reduction technique was adopted.
In these works, the selection of formal concepts is based on the notion of distance or similarity.The concepts of equivalence classes and similarity of objects or attributes are used in the process of selecting important concepts.Belohlavek & Macko (2011) In this article, a weight is assigned to each attribute to express its relevance, and then selects formal concepts considered relevant.
To facilitate the application of weights, assign equal weights are assigned to attributes derived from multivalued attributes.The importance of a formal concept is measured by the sum of the weights of its attributes intention divided by the cardinality of its intention.Li, Li, & He (2014) Compressed a concept lattice arising from incomplete contexts using k-medoids clustering.In this process, Accuracy and similarity measures of approximate concepts are obtained and then K-medoids clustering is performed and concept lattice is compressed.
Singh, Cherukuri, & Li (2015) Sumangali, Kumar, & Li (2017) Few studies have recently utilized the notion of entropy based FCA.Singh et al. (2015) have concentrated on decreasing the number of formal concepts in FCA with fuzzy attributes using entropy.Further, the number of fuzzy formal concepts is reduced at chosen granulation of the entropy based attribute intent weight.Singh & Kumar (2016) Recently, authors have concentrated to reduce a concept lattice using different subset of attributes as information granules.

Conclusion
In this paper we have presented an overview on the foundations of FCA and its historical growth to fulfil the thirst of the beginner towards FCA.The terms and notions relevant to FCA are recalled and illustrated by means of examples.FCA extracts the knowledge from any data par excellence in three dimensions viz., conceptual clusters, lattices (graphical representation) and association rules.The main advantages of the use of FCA are its simplicity, diagrammatical representation, and hierarchical overview of the underlying patterns and rules from the formal context.The common issue arising in FCA is the scalability owing to huge size contexts.We have reviewed some of the recent works on scalability.

Fig. 3 .
Fig. 3. Concept lattice for the formal context of Table 5

Table 2
Many-valued context of platonian bodies

Table 3
Forge (2010)l.(2010))ceptlattices, another type of lattices viz., alpha concept lattices were introduced by the authors.Some class restraints are constructed in a formal context with attributes.The resulting concept lattice is known as alpha concept lattice byPernelle et al. (2002).An unrestricted lattice results in an iceberg concept lattice, i.e. one having only frequent formal concepts.Soldano et al. (2010)have discussed the construction of alpha lattices.The extent of a term in Alpha lattices is restricted according to constraints based on an apriori categorization ofForge (2010)instances in classes, and on a degree α, which results in a smaller lattice.Authors determine the Attribute dependency (AD) formulas from the background knowledge.Those concepts which do not obey these AD formulas are removed.