Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matrix Factorization Algorithms #157

Open
wants to merge 46 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
a2e642c
algos
FWurmbach Jan 25, 2024
c0a5147
removed unused/duplicate functions
FWurmbach Feb 8, 2024
fbee8a5
Started Reimplementing ASSO
JannikNordmeyer Jun 30, 2024
8f8591f
Progressed ASSO Implementation.
JannikNordmeyer Jul 9, 2024
b6df6e0
Implemented Grecond Algorithm.
JannikNordmeyer Jul 12, 2024
5b48e69
Started Implementing GreEss Algorithm.
JannikNordmeyer Jul 13, 2024
e878689
Implemented Tiling Algorithm
JannikNordmeyer Jul 26, 2024
a2b0c08
Implemented Method for Factor Context multiplication.
JannikNordmeyer Jul 26, 2024
27f63f6
Continued Implementing GreEss Algorithm.
JannikNordmeyer Jul 27, 2024
d6edeb9
Continued Implementing GreEss Algorithm.
JannikNordmeyer Jul 28, 2024
fb137ca
Fixed Error in essential-context Function.
JannikNordmeyer Jul 28, 2024
a67cc13
Implemented GreEss Algorithm.
JannikNordmeyer Jul 28, 2024
635dabb
Started Implementing PaNDa Algorithm.
JannikNordmeyer Aug 7, 2024
a9a9700
Implemented PaNDa algorithm.
JannikNordmeyer Aug 8, 2024
7aedb2e
Started Implementing topFiberM algorithm.
JannikNordmeyer Aug 9, 2024
7ffd080
Continued Implementing topFiberM Algorithm.
JannikNordmeyer Aug 10, 2024
7783ebe
Finished Subroutine for topFiberM Algorithm.
JannikNordmeyer Aug 11, 2024
2180916
Preliminary topFiberM Implementation.
JannikNordmeyer Aug 11, 2024
72c69b4
Minor Alteration to topFiberM Algorithm.
JannikNordmeyer Aug 11, 2024
899c677
Implemented topFiberM algorithm.
JannikNordmeyer Aug 17, 2024
8095004
Updated Implementation of ASSO Algorithm.
JannikNordmeyer Aug 17, 2024
cc91443
Implemented HYPER Algorithm.
JannikNordmeyer Aug 18, 2024
20b4b40
Minor Update to HYPER Algorithm.
JannikNordmeyer Aug 19, 2024
958b62c
Introduced Factorization Record.
JannikNordmeyer Aug 24, 2024
08563ea
Improved Implementation of PaNDa Algorithm.
JannikNordmeyer Aug 24, 2024
d7a22ee
Converted Return Value of ASSO Algorithm.
JannikNordmeyer Aug 25, 2024
fa4b3ac
Added Test for Matrix Factorizations.
JannikNordmeyer Aug 25, 2024
1917210
Removed Obsolete Implementations of Matrix Factorizations.
JannikNordmeyer Aug 27, 2024
de10e17
Removed Obsolete Functions.
JannikNordmeyer Aug 27, 2024
ae185c6
Merge branch 'tomhanika:dev' into masterarbeit/factorization
JannikNordmeyer Aug 27, 2024
bb41570
Removed Outdated Documentation.
JannikNordmeyer Aug 27, 2024
766e345
Merge branch 'masterarbeit/factorization' of https://github.com/Janni…
JannikNordmeyer Aug 27, 2024
6320817
Added Documentation File for Boolean Matrix Decompositions.
JannikNordmeyer Sep 1, 2024
57ade36
Completed Documentation for Boolean Matrix Factorization.
JannikNordmeyer Sep 1, 2024
66ae18c
Added Documentation to README.
JannikNordmeyer Sep 1, 2024
4b3a922
Adapted workflow for update-deps-lock.yaml when release
tomhanika Sep 4, 2024
8d96096
Added Matrix Factorizations to Analysis Namespace.
JannikNordmeyer Sep 5, 2024
4d0318a
Fixed Error in analysis Namespace.
JannikNordmeyer Sep 11, 2024
331f903
Relocated Context-Boolean-Matrix-Product Method.
JannikNordmeyer Sep 11, 2024
bdaaba8
Outsourced Functions into new Matrix Namespace.
JannikNordmeyer Sep 12, 2024
8fa76ff
Relocated ArgMax Function.
JannikNordmeyer Sep 12, 2024
f84e102
Improved Documentation for Boolean Matrix Factorizations.
JannikNordmeyer Sep 13, 2024
34f8414
Outsourced Miscellaneous Functions.
JannikNordmeyer Sep 13, 2024
ca1828d
Improved Documentation for Boolean Matrix Factorization.
JannikNordmeyer Sep 13, 2024
e535750
Merge branch 'tomhanika:master' into masterarbeit/factorization
JannikNordmeyer Sep 14, 2024
76fff22
Fixed Naming Error.
JannikNordmeyer Nov 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/update-deps-lock.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ on:

jobs:
update-lock:
if: startsWith(github.ref, 'refs/heads/')
runs-on: ubuntu-latest

steps:
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ much more.
3. [triadic-exploration](doc/Triadic-Exploration.org)
4. [protoconcepts](doc/Protoconcepts.org)
5. [Incomplete Contexts](doc/IncompleteContexts.org)
6. [Factorization of Formal Contexts](doc/MatrixFactorization.org)
6. [API documentation](doc/API.md)
7. [Development](doc/Development.org)

Expand Down
91 changes: 91 additions & 0 deletions doc/MatrixFactorization.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
#+property: header-args :wrap src text
#+property: header-args:text :eval never

* Boolean Matrix Factorization

The goal of Boolean matrix factorization is to decompose a binary matrix, in this case a formal context into two smaller contexts, in such a way that the Boolean matrix product of these contexts is equal or at least similar to the original context.
The first context represents the relationship of the objects to so called *factors*, and the second represents the relationship of these factors to the attributes. The number of factors should be as low as possible.
Since this is an NP-hard problem, a variety of approximative algorithms have been proposed. Those algorithms offer different tradeoffs between accuracy and the number of factors.


~conexp-clj~ offers the following algorithms for Boolean matrix factoriation:

The algorithms are available via the ~matrix-factorizations~ namespace:

#+begin_src clojure
(use 'conexp.fca.matrix-factorizations)
(def water-ctx (read-context "testing-data/Living-Beings-and-Water.ctx"))
#+end_src

The following algorithms are available:

The ~hyper~-algorithm computes an errorless decomposition but is not optimal with respect to the amount of factors. In addition to the context, it takes one additional argument. The argument is the minimum support for all candidates considered by the algorithm.
Decreasing this value may yield a decomposition with fewer values, but increases the running time.
A detailed description of the algorithm can found here:
https://doi.org/10.1007/s10618-010-0203-9

#+begin_src clojure
(hyper water-ctx 0.7)
#+end_src

The ~topFiberM~-algorithm takes as argument the context to be decomposed, the maximum number of factors, the precision threshold and the search limit. The search limit is the maximum number of iterations of the algorithm.
If it exceeds the specified number of factors, already selected factors may be replaced by better ones. A detailed explanation can be found here:
https://doi.org/10.48550/arXiv.1903.10326

#+begin_src clojure
(topFiberM water-ctx 3 0.7 5)
#+end_src

The ~PaNDa~-algorithm computes a Boolean matrix factorization based on the specified Context and the number of factors:
https://doi.org/10.1137/1.9781611972801.15

#+begin_src clojure
(PaNDa water-ctx 5)
#+end_src

The ~tiling~-algorithm is loosely based on the following paper:
https://doi.org/10.1007/978-3-540-30214-8_22

It also computes a factorization based on the context, and the number of factors:

#+begin_src clojure
tiling water-ctx 5)
#+end_src

~grecond~ computes a Boolean matrix factorization, that is optimal with respect to coverage:
https://doi.org/10.1016/j.jcss.2009.05.002

#+begin_src clojure
(grecond water-ctx)
#+end_src

The ~GreEss~-algorithm computes a factorization that is accurate in terms of coverage with a specified permitted error. The algorithm only commits undercoverage errors. A detailed description can be found here:
https://doi.org/10.1016/j.jcss.2015.06.002

#+begin_src clojure
(GreEss water-ctx 3)
#+end_src

The ~ASSO~-algorithm requires the following arguments:
- the context to be decomposed
- the number of factor
- a threshold value that controls the precition of the result
- a weight that controls how much correctly covered entries are rewarded
- a weight that controls how much fals positives are penalized

A detailed description of the algorithm can be found here:
https://doi.org/10.1109/TKDE.2008.53

#+begin_src clojure
(ASSO water-ctx 5 0.7 1 1)
#+end_src


All of the above algorithms contain a factorization object, that contains the two contexts resulting from the decompositions. Calling ~context~ on this object returns the Boolean matric product of those two contexts:

#+begin_src clojure
(def factorization (hyper water-ctx 0.7))
(context factorization
#+end_src


3 changes: 2 additions & 1 deletion src/main/clojure/conexp/analysis.clj
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@
[protoconcepts :refer :all]
[random-contexts :refer :all]
[simplicial-complexes :refer :all]
[triadic-exploration :refer :all]]
[triadic-exploration :refer :all]
[matrix-factorizations :refer :all]]
[conexp.math
[algebra :refer :all]
[markov :refer :all]
Expand Down
5 changes: 5 additions & 0 deletions src/main/clojure/conexp/base.clj
Original file line number Diff line number Diff line change
Expand Up @@ -913,4 +913,9 @@ metadata (as provided by def) merged into the metadata of the original."
reducer-fn (fn [c d] (union (make-set c) (make-set d)))]
(fn [A] (reduce reducer-fn #{} (map m A)))))
;;;

(defn argmax [function coll]
"Returns the value in *coll* for which (function coll) returns the highest value."
(apply max-key function coll)
)
nil
21 changes: 21 additions & 0 deletions src/main/clojure/conexp/fca/contexts.clj
Original file line number Diff line number Diff line change
Expand Up @@ -276,6 +276,16 @@
(let [objs (aprime ctx #{m})]
[objs (oprime ctx objs)]))

(defn object-concepts [ctx]
"Returns a set of all object-concepts of the specified context."
(set (for [obj (objects ctx)] (object-concept ctx obj)))
)

(defn attribute-concepts [ctx]
"Returns a set of all attribute-concepts of the specified context."
(set (for [attr (attributes ctx)] (attribute-concept ctx attr)))
)

(defn clarify-objects
"Clarifies objects in context ctx."
[ctx]
Expand Down Expand Up @@ -765,6 +775,17 @@
(I_2 [g_2,m_2])))]
(make-context-nc new-objs new-atts new-inz)))

(defn context-boolean-matrix-product [ctx1 ctx2]
"Computes a context in the form of the boolean matrix product of both contexts.
The contexts need to have appropriate dimensions and the set of attributes of *ctx1*
must be equal to the set of objects of *ctx2*."
(make-context (objects ctx1)
(attributes ctx2)
(fn [g m] (not= (intersection (object-derivation ctx1 #{g})
(attribute-derivation ctx2 #{m}))
#{})))
)

;;; Neighbours in the concept lattice with Lindig's Algorithm

(defn direct-upper-concepts
Expand Down
2 changes: 1 addition & 1 deletion src/main/clojure/conexp/fca/implications.clj
Original file line number Diff line number Diff line change
Expand Up @@ -628,7 +628,7 @@
(/ (absolute-support ctx [(union premise conclusion) #{}])
(absolute-support ctx [conclusion #{}]))))

(defn- frequent-itemsets
(defn frequent-itemsets
"Returns all frequent itemsets of context, given minsupp as minimal support."
;; UNTESTED!
[context minsupp]
Expand Down
100 changes: 100 additions & 0 deletions src/main/clojure/conexp/fca/matrices.clj
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
;; Copyright ⓒ the conexp-clj developers; all rights reserved.
;; The use and distribution terms for this software are covered by the
;; Eclipse Public License 1.0 (http://opensource.org/licenses/eclipse-1.0.php)
;; which can be found in the file LICENSE at the root of this distribution.
;; By using this software in any fashion, you are agreeing to be bound by
;; the terms of this license.
;; You must not remove this notice, or any other, from this software.

(ns conexp.fca.matrices
"Provides the implementation of formal contexts and functions on them."
(:require [conexp.base :refer :all]))


(defn transpose [M]
"Returns a transposed matrix."
(into [] (apply map vector M))
)

(defn matrix-row [M index]
"Returns the indicated row of the matrix."
(M index)
)

(defn add-row [M row]
"Returs the matrix with the new row added."
(if (and (not= (count M) 0)
(not= (count row) (count (first M))))
(throw (Exception. "Row does not have the correct length. "))
(conj M row))
)

(defn set-row [M pos row]
"Replaces the row of *M* at index *pos* with *row*."
(assoc M pos row)
)

(defn row-number [M]
"Returns the number of rows of the matrix."
(count M)
)

(defn matrix-column [M index]
"Returns the indicated column of the matrix."
(into [] (for [row M] (row index)))
)

(defn add-column [M col]
"Returns the matrix with the new column added."
(if (= M []) (transpose [col])
(if (not= (count col) (count M))
(throw (Exception. "Column does not have the correct length."))
(transpose (add-row (transpose M) col))))
)

(defn set-column [M pos col]
"Replaces the column of *M* at index *pos* with *col*."
(map #(assoc %1 pos %2) M col)
)

(defn col-number [M]
"Returns the number of columns of the matrix."
(count (first M))
)

(defn scalar-product [V1 V2]
"Computes the scalar/dot product of two vectors"
(reduce + (map * V1 V2))
)

(defn boolean-matrix-product [M1 M2]
"Computes the product of two matrices with addition being interpreted as boolean OR."
(transpose (for [c (range (col-number M2))]
(for [r (range (row-number M1))]
(min (scalar-product (matrix-column M2 c)
(matrix-row M1 r))
1))))
)

(defn matrix-boolean-difference [M1 M2]
"Computes a new matrix from two boolean matrices of the same dimension by subtracting
each of their entries pairwise, with 0 - 1 = 0"
(into [] (for [r (range (row-number M1))]
(into [] (for [c (range (col-number M1))]
(max (- ((M1 r) c) ((M2 r) c))
0)))))
)

(defn matrix-entrywise-product [M1 M2]
"Computes a new matrix from two matrices of the same dimension by multiplying
each of their entries pairwise."
(into [] (for [r (range (row-number M1))]
(into [] (for [c (range (col-number M1))]
(* ((M1 r) c) ((M2 r) c))))))
)

(defn outer-prod [v1 v2]
"Computes the outer product of two vectors."
(into [] (for [x v1]
(into [] (for [y v2] (* x y)))))
)
Loading
Loading