스크킷 학습 의사 결정 트리에서 의사 결정 규칙을 추출하는 방법은 무엇입니까?

programing

스크킷 학습 의사 결정 트리에서 의사 결정 규칙을 추출하는 방법은 무엇입니까?

muds 2023. 6. 10. 09:48

스크킷 학습 의사 결정 트리에서 의사 결정 규칙을 추출하는 방법은 무엇입니까?

의사 결정 트리의 훈련된 트리에서 기본 의사 결정 규칙(또는 '의사 결정 경로')을 텍스트 목록으로 추출할 수 있습니까?

다음과 같은 것:

if A>0.4 then if B<0.2 then if C>0.8 then class='X'

저는 이 답변이 여기에 있는 다른 답변보다 더 정확하다고 생각합니다.

from sklearn.tree import _tree

def tree_to_code(tree, feature_names):
    tree_ = tree.tree_
    feature_name = [
        feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
        for i in tree_.feature
    ]
    print "def tree({}):".format(", ".join(feature_names))

    def recurse(node, depth):
        indent = "  " * depth
        if tree_.feature[node] != _tree.TREE_UNDEFINED:
            name = feature_name[node]
            threshold = tree_.threshold[node]
            print "{}if {} <= {}:".format(indent, name, threshold)
            recurse(tree_.children_left[node], depth + 1)
            print "{}else:  # if {} > {}".format(indent, name, threshold)
            recurse(tree_.children_right[node], depth + 1)
        else:
            print "{}return {}".format(indent, tree_.value[node])

    recurse(0, 1)

그러면 유효한 Python 함수가 출력됩니다.다음은 0에서 10 사이의 숫자인 입력을 반환하려는 트리의 출력 예입니다.

def tree(f0):
  if f0 <= 6.0:
    if f0 <= 1.5:
      return [[ 0.]]
    else:  # if f0 > 1.5
      if f0 <= 4.5:
        if f0 <= 3.5:
          return [[ 3.]]
        else:  # if f0 > 3.5
          return [[ 4.]]
      else:  # if f0 > 4.5
        return [[ 5.]]
  else:  # if f0 > 6.0
    if f0 <= 8.5:
      if f0 <= 7.5:
        return [[ 7.]]
      else:  # if f0 > 7.5
        return [[ 8.]]
    else:  # if f0 > 8.5
      return [[ 9.]]

다음은 다른 답변에서 볼 수 있는 몇 가지 장애물입니다.

용사를 합니다.tree_.threshold == -2노드가 잎인지 여부를 결정하는 것은 좋은 생각이 아닙니다.임계값이 -2인 실제 의사결정 노드라면?대신에, 당신은 그것을 보아야 합니다.tree.feature또는tree.children_*.
인features = [feature_names[i] for i in tree_.feature]합니다. sklearn의 값 입니다. 왜냐하면 일부 값이tree.tree_.feature 노드의 2(으)로 표시됩니다.
재귀 함수에 문이 여러 개 있을 필요는 없으며 하나만 있으면 됩니다.

나는 sklearn이 만든 의사 결정 트리에서 규칙을 추출하기 위해 나만의 함수를 만들었습니다.

import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier

# dummy data:
df = pd.DataFrame({'col1':[0,1,2,3],'col2':[3,4,5,6],'dv':[0,1,0,1]})

# create decision tree
dt = DecisionTreeClassifier(max_depth=5, min_samples_leaf=1)
dt.fit(df.ix[:,:2], df.dv)

이 함수는 먼저 노드(하위 배열에서 -1로 식별됨)로 시작한 다음 재귀적으로 부모를 찾습니다.저는 이것을 노드의 '계보'라고 부릅니다.그 과정에서 SAS 로직의 if/then/se를 생성해야 하는 데 필요한 값을 파악합니다.

def get_lineage(tree, feature_names):
     left      = tree.tree_.children_left
     right     = tree.tree_.children_right
     threshold = tree.tree_.threshold
     features  = [feature_names[i] for i in tree.tree_.feature]

     # get ids of child nodes
     idx = np.argwhere(left == -1)[:,0]     

     def recurse(left, right, child, lineage=None):          
          if lineage is None:
               lineage = [child]
          if child in left:
               parent = np.where(left == child)[0].item()
               split = 'l'
          else:
               parent = np.where(right == child)[0].item()
               split = 'r'

          lineage.append((parent, split, threshold[parent], features[parent]))

          if parent == 0:
               lineage.reverse()
               return lineage
          else:
               return recurse(left, right, parent, lineage)

     for child in idx:
          for node in recurse(left, right, child):
               print node

아래 튜플 집합에는 SAS if/then/else 문을 생성하는 데 필요한 모든 내용이 포함되어 있습니다.사용하는 것을 좋아하지 않습니다.doSAS의 블록으로 노드의 전체 경로를 설명하는 논리를 만듭니다.튜플 뒤의 단일 정수는 경로의 터미널 노드의 ID입니다.앞의 모든 튜플이 결합하여 해당 노드를 만듭니다.

In [1]: get_lineage(dt, df.columns)
(0, 'l', 0.5, 'col1')
1
(0, 'r', 0.5, 'col1')
(2, 'l', 4.5, 'col2')
3
(0, 'r', 0.5, 'col1')
(2, 'r', 4.5, 'col2')
(4, 'l', 2.5, 'col1')
5
(0, 'r', 0.5, 'col1')
(2, 'r', 4.5, 'col2')
(4, 'r', 2.5, 'col1')
6

예제 트리의 GraphViz 출력

은 Scikit learn이라고 불리는 했습니다.export_text0.21 버전(2019년 5월)에서 트리에서 규칙을 추출합니다.설명서는 여기에 있습니다.더 이상 사용자 지정 기능을 만들 필요가 없습니다.

모델을 맞추면 두 줄의 코드만 있으면 됩니다.먼저, 가기오를 가져옵니다.export_text:

from sklearn.tree import export_text

둘째, 규칙을 포함할 개체를 만듭니다.더 쉽게 ▁the를 사용합니다.feature_names인수를 지정하고 피쳐 이름 목록을 전달합니다.를 들어,이 예를들어, 모이호는경우입니다.model인 당특징데은프이레이지다니어집름이서임에터그리고신의라는 이름으로 있습니다.X_train라고 하는 객체를 만들 수 있습니다.tree_rules:

tree_rules = export_text(model, feature_names=list(X_train.columns))

하거나 저장하세요.tree_rules출력은 다음과 같습니다.

|--- Age <= 0.63
|   |--- EstimatedSalary <= 0.61
|   |   |--- Age <= -0.16
|   |   |   |--- class: 0
|   |   |--- Age >  -0.16
|   |   |   |--- EstimatedSalary <= -0.06
|   |   |   |   |--- class: 0
|   |   |   |--- EstimatedSalary >  -0.06
|   |   |   |   |--- EstimatedSalary <= 0.40
|   |   |   |   |   |--- EstimatedSalary <= 0.03
|   |   |   |   |   |   |--- class: 1

Zelazny7에서 제출한 코드를 수정하여 의사 코드를 출력했습니다.

def get_code(tree, feature_names):
        left      = tree.tree_.children_left
        right     = tree.tree_.children_right
        threshold = tree.tree_.threshold
        features  = [feature_names[i] for i in tree.tree_.feature]
        value = tree.tree_.value

        def recurse(left, right, threshold, features, node):
                if (threshold[node] != -2):
                        print "if ( " + features[node] + " <= " + str(threshold[node]) + " ) {"
                        if left[node] != -1:
                                recurse (left, right, threshold, features,left[node])
                        print "} else {"
                        if right[node] != -1:
                                recurse (left, right, threshold, features,right[node])
                        print "}"
                else:
                        print "return " + str(value[node])

        recurse(left, right, threshold, features, 0)

에 get_code(dt, df.columns)동일한 예에서 다음을 얻을 수 있습니다.

if ( col1 <= 0.5 ) {
return [[ 1.  0.]]
} else {
if ( col2 <= 4.5 ) {
return [[ 0.  1.]]
} else {
if ( col1 <= 2.5 ) {
return [[ 1.  0.]]
} else {
return [[ 0.  1.]]
}
}
}

새로운 방법이 있습니다.decision_path0.18.0 릴리스에서.개발자들은 광범위한(잘 문서화된) 현장 조사를 제공합니다.

트리 구조를 인쇄하는 워크스루에서 코드의 첫 번째 섹션은 문제가 없는 것 같습니다.하지만 두 번째 섹션의 코드를 수정하여 샘플 하나를 조사했습니다.는 변경내다같습다니로 됩니다.# <--

편집 표시된 변경 내용# <--아래 코드는 풀 요청 #8653 및 #10951에서 오류가 지적된 이후 워크스루 링크에서 업데이트되었습니다.이제 따라가기가 훨씬 쉬워졌습니다.

sample_id = 0
node_index = node_indicator.indices[node_indicator.indptr[sample_id]:
                                    node_indicator.indptr[sample_id + 1]]

print('Rules used to predict sample %s: ' % sample_id)
for node_id in node_index:

    if leave_id[sample_id] == node_id:  # <-- changed != to ==
        #continue # <-- comment out
        print("leaf node {} reached, no decision here".format(leave_id[sample_id])) # <--

    else: # < -- added else to iterate through decision nodes
        if (X_test[sample_id, feature[node_id]] <= threshold[node_id]):
            threshold_sign = "<="
        else:
            threshold_sign = ">"

        print("decision id node %s : (X[%s, %s] (= %s) %s %s)"
              % (node_id,
                 sample_id,
                 feature[node_id],
                 X_test[sample_id, feature[node_id]], # <-- changed i to sample_id
                 threshold_sign,
                 threshold[node_id]))

Rules used to predict sample 0: 
decision id node 0 : (X[0, 3] (= 2.4) > 0.800000011921)
decision id node 2 : (X[0, 2] (= 5.1) > 4.94999980927)
leaf node 4 reached, no decision here

변할내용을 합니다.sample_id다른 표본의 결정 경로를 확인합니다.저는 개발자들에게 이러한 변경 사항에 대해 질문하지 않았습니다. 단지 예제를 통해 작업할 때 더 직관적으로 보였습니다.

from StringIO import StringIO
out = StringIO()
out = tree.export_graphviz(clf, out_file=out)
print out.getvalue()

디그래프 트리를 볼 수 있습니다.그리고나서,clf.tree_.feature그리고.clf.tree_.value기능을 분할하는 노드 배열과 노드 값 배열입니다.이 github 소스에서 더 자세한 정보를 참조할 수 있습니다.

저는 의사 결정 트리에서 좀 더 인간 친화적인 형식의 규칙이 필요했습니다.저는 오픈 소스 AutoML Python 패키지를 만들고 있으며 MLJAR 사용자들은 트리에서 정확한 규칙을 보고 싶어 합니다.

그래서 다음과 같은 기능을 구현했습니다.paulkernfeld정답.

def get_rules(tree, feature_names, class_names):
    tree_ = tree.tree_
    feature_name = [
        feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
        for i in tree_.feature
    ]

    paths = []
    path = []
    
    def recurse(node, path, paths):
        
        if tree_.feature[node] != _tree.TREE_UNDEFINED:
            name = feature_name[node]
            threshold = tree_.threshold[node]
            p1, p2 = list(path), list(path)
            p1 += [f"({name} <= {np.round(threshold, 3)})"]
            recurse(tree_.children_left[node], p1, paths)
            p2 += [f"({name} > {np.round(threshold, 3)})"]
            recurse(tree_.children_right[node], p2, paths)
        else:
            path += [(tree_.value[node], tree_.n_node_samples[node])]
            paths += [path]
            
    recurse(0, path, paths)

    # sort by samples count
    samples_count = [p[-1][1] for p in paths]
    ii = list(np.argsort(samples_count))
    paths = [paths[i] for i in reversed(ii)]
    
    rules = []
    for path in paths:
        rule = "if "
        
        for p in path[:-1]:
            if rule != "if ":
                rule += " and "
            rule += str(p)
        rule += " then "
        if class_names is None:
            rule += "response: "+str(np.round(path[-1][0][0][0],3))
        else:
            classes = path[-1][0][0]
            l = np.argmax(classes)
            rule += f"class: {class_names[l]} (proba: {np.round(100.0*classes[l]/np.sum(classes),2)}%)"
        rule += f" | based on {path[-1][1]:,} samples"
        rules += [rule]
        
    return rules

규칙은 각 규칙에 할당된 교육 샘플 수에 따라 정렬됩니다.각 규칙에는 분류 작업에 대한 예측 클래스 이름 및 예측 확률에 대한 정보가 있습니다.회귀 작업의 경우 예측 값에 대한 정보만 인쇄됩니다.

예

from sklearn import datasets
from sklearn.tree import DecisionTreeRegressor
from sklearn import tree
from sklearn.tree import _tree

# Prepare the data data
boston = datasets.load_boston()
X = boston.data
y = boston.target

# Fit the regressor, set max_depth = 3
regr = DecisionTreeRegressor(max_depth=3, random_state=1234)
model = regr.fit(X, y)

# Print rules
rules = get_rules(regr, boston.feature_names, None)
for r in rules:
    print(r)

인쇄된 규칙:

if (RM <= 6.941) and (LSTAT <= 14.4) and (DIS > 1.385) then response: 22.905 | based on 250 samples
if (RM <= 6.941) and (LSTAT > 14.4) and (CRIM <= 6.992) then response: 17.138 | based on 101 samples
if (RM <= 6.941) and (LSTAT > 14.4) and (CRIM > 6.992) then response: 11.978 | based on 74 samples
if (RM > 6.941) and (RM <= 7.437) and (NOX <= 0.659) then response: 33.349 | based on 43 samples
if (RM > 6.941) and (RM > 7.437) and (PTRATIO <= 19.65) then response: 45.897 | based on 29 samples
if (RM <= 6.941) and (LSTAT <= 14.4) and (DIS <= 1.385) then response: 45.58 | based on 5 samples
if (RM > 6.941) and (RM <= 7.437) and (NOX > 0.659) then response: 14.4 | based on 3 samples
if (RM > 6.941) and (RM > 7.437) and (PTRATIO > 19.65) then response: 21.9 | based on 1 samples

나는 내 기사: Scikit-Learn과 Python을 사용한 3가지 방법으로 의사결정 트리에서 규칙을 추출하는 방법을 요약했습니다.

이제 export_text를 사용할 수 있습니다.

from sklearn.tree import export_text

r = export_text(loan_tree, feature_names=(list(X_train.columns)))
print(r)

[sklearn][1]의 완전한 예

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_text
iris = load_iris()
X = iris['data']
y = iris['target']
decision_tree = DecisionTreeClassifier(random_state=0, max_depth=2)
decision_tree = decision_tree.fit(X, y)
r = export_text(decision_tree, feature_names=iris['feature_names'])
print(r)

이것이 당신이 필요로 하는 코드입니다.

나는 주피터 노트북 파이썬 3에 올바르게 들여쓰기 위해 상위 좋아요 코드를 수정했습니다.

import numpy as np
from sklearn.tree import _tree

def tree_to_code(tree, feature_names):
    tree_ = tree.tree_
    feature_name = [feature_names[i] 
                    if i != _tree.TREE_UNDEFINED else "undefined!" 
                    for i in tree_.feature]
    print("def tree({}):".format(", ".join(feature_names)))

    def recurse(node, depth):
        indent = "    " * depth
        if tree_.feature[node] != _tree.TREE_UNDEFINED:
            name = feature_name[node]
            threshold = tree_.threshold[node]
            print("{}if {} <= {}:".format(indent, name, threshold))
            recurse(tree_.children_left[node], depth + 1)
            print("{}else:  # if {} > {}".format(indent, name, threshold))
            recurse(tree_.children_right[node], depth + 1)
        else:
            print("{}return {}".format(indent, np.argmax(tree_.value[node])))

    recurse(0, 1)

모두가 너무 도움이 되었기 때문에 Zelazny7과 Danielle의 아름다운 솔루션에 수정 사항을 추가하겠습니다.이것은 python 2.7용으로 탭이 있어 더 읽기 쉽게 만들 수 있습니다.

def get_code(tree, feature_names, tabdepth=0):
    left      = tree.tree_.children_left
    right     = tree.tree_.children_right
    threshold = tree.tree_.threshold
    features  = [feature_names[i] for i in tree.tree_.feature]
    value = tree.tree_.value

    def recurse(left, right, threshold, features, node, tabdepth=0):
            if (threshold[node] != -2):
                    print '\t' * tabdepth,
                    print "if ( " + features[node] + " <= " + str(threshold[node]) + " ) {"
                    if left[node] != -1:
                            recurse (left, right, threshold, features,left[node], tabdepth+1)
                    print '\t' * tabdepth,
                    print "} else {"
                    if right[node] != -1:
                            recurse (left, right, threshold, features,right[node], tabdepth+1)
                    print '\t' * tabdepth,
                    print "}"
            else:
                    print '\t' * tabdepth,
                    print "return " + str(value[node])

    recurse(left, right, threshold, features, 0)

저는 이것을 겪고 있지만, 저는 이 형식으로 작성된 규칙이 필요했습니다.

if A>0.4 then if B<0.2 then if C>0.8 then class='X'

그래서 @paulkernfeld(감사합니다)의 답변을 당신의 필요에 맞게 조정했습니다.

def tree_to_code(tree, feature_names, Y):
    tree_ = tree.tree_
    feature_name = [
        feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
        for i in tree_.feature
    ]
    pathto=dict()

    global k
    k = 0
    def recurse(node, depth, parent):
        global k
        indent = "  " * depth

        if tree_.feature[node] != _tree.TREE_UNDEFINED:
            name = feature_name[node]
            threshold = tree_.threshold[node]
            s= "{} <= {} ".format( name, threshold, node )
            if node == 0:
                pathto[node]=s
            else:
                pathto[node]=pathto[parent]+' & ' +s

            recurse(tree_.children_left[node], depth + 1, node)
            s="{} > {}".format( name, threshold)
            if node == 0:
                pathto[node]=s
            else:
                pathto[node]=pathto[parent]+' & ' +s
            recurse(tree_.children_right[node], depth + 1, node)
        else:
            k=k+1
            print(k,')',pathto[parent], tree_.value[node])
    recurse(0, 1, 0)

아래 코드는 아나콘다 파이썬 2.7과 패키지 이름 "pydot-ng" 아래에서 결정 규칙이 있는 PDF 파일을 만드는 제 접근 방식입니다.도움이 되길 바랍니다.

from sklearn import tree

clf = tree.DecisionTreeClassifier(max_leaf_nodes=n)
clf_ = clf.fit(X, data_y)

feature_names = X.columns
class_name = clf_.classes_.astype(int).astype(str)

def output_pdf(clf_, name):
    from sklearn import tree
    from sklearn.externals.six import StringIO
    import pydot_ng as pydot
    dot_data = StringIO()
    tree.export_graphviz(clf_, out_file=dot_data,
                         feature_names=feature_names,
                         class_names=class_name,
                         filled=True, rounded=True,
                         special_characters=True,
                          node_ids=1,)
    graph = pydot.graph_from_dot_data(dot_data.getvalue())
    graph.write_pdf("%s.pdf"%name)

output_pdf(clf_, name='filename%s'%n)

이곳의 나무 그림 전시회

다음은 SKompiler 라이브러리를 사용하여 전체 트리를 하나의 파이썬 식으로 변환하는 방법입니다.

from skompiler import skompile
skompile(dtree.predict).to('python/code')

이것은 @paulkernfeld의 대답을 기반으로 합니다.피쳐가 포함된 데이터 프레임 X와 반응이 포함된 대상 데이터 프레임이 있고 어떤 y 값이 어떤 노드에서 끝나는지를 파악하고 그에 따라 플롯하려는 경우 다음 작업을 수행할 수 있습니다.

    def tree_to_code(tree, feature_names):
        from sklearn.tree import _tree
        codelines = []
        codelines.append('def get_cat(X_tmp):\n')
        codelines.append('   catout = []\n')
        codelines.append('   for codelines in range(0,X_tmp.shape[0]):\n')
        codelines.append('      Xin = X_tmp.iloc[codelines]\n')
        tree_ = tree.tree_
        feature_name = [
            feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
            for i in tree_.feature
        ]
        #print "def tree({}):".format(", ".join(feature_names))

        def recurse(node, depth):
            indent = "      " * depth
            if tree_.feature[node] != _tree.TREE_UNDEFINED:
                name = feature_name[node]
                threshold = tree_.threshold[node]
                codelines.append ('{}if Xin["{}"] <= {}:\n'.format(indent, name, threshold))
                recurse(tree_.children_left[node], depth + 1)
                codelines.append( '{}else:  # if Xin["{}"] > {}\n'.format(indent, name, threshold))
                recurse(tree_.children_right[node], depth + 1)
            else:
                codelines.append( '{}mycat = {}\n'.format(indent, node))

        recurse(0, 1)
        codelines.append('      catout.append(mycat)\n')
        codelines.append('   return pd.DataFrame(catout,index=X_tmp.index,columns=["category"])\n')
        codelines.append('node_ids = get_cat(X)\n')
        return codelines
    mycode = tree_to_code(clf,X.columns.values)

    # now execute the function and obtain the dataframe with all nodes
    exec(''.join(mycode))
    node_ids = [int(x[0]) for x in node_ids.values]
    node_ids2 = pd.DataFrame(node_ids)

    print('make plot')
    import matplotlib.cm as cm
    colors = cm.rainbow(np.linspace(0, 1, 1+max( list(set(node_ids)))))
    #plt.figure(figsize=cm2inch(24, 21))
    for i in list(set(node_ids)):
        plt.plot(y[node_ids2.values==i],'o',color=colors[i], label=str(i))  
    mytitle = ['y colored by node']
    plt.title(mytitle ,fontsize=14)
    plt.xlabel('my xlabel')
    plt.ylabel(tagname)
    plt.xticks(rotation=70)       
    plt.legend(loc='upper center', bbox_to_anchor=(0.5, 1.00), shadow=True, ncol=9)
    plt.tight_layout()
    plt.show()
    plt.close

가장 우아한 버전은 아니지만, 그것은 효과가 있습니다.

다음은 python 3에서 scikit-learn 의사 결정 트리의 규칙을 인쇄하고 구조를 더 읽기 쉽게 만들기 위한 조건부 블록에 대한 오프셋을 포함하는 함수입니다.

def print_decision_tree(tree, feature_names=None, offset_unit='    '):
    '''Plots textual representation of rules of a decision tree
    tree: scikit-learn representation of tree
    feature_names: list of feature names. They are set to f1,f2,f3,... if not specified
    offset_unit: a string of offset of the conditional block'''

    left      = tree.tree_.children_left
    right     = tree.tree_.children_right
    threshold = tree.tree_.threshold
    value = tree.tree_.value
    if feature_names is None:
        features  = ['f%d'%i for i in tree.tree_.feature]
    else:
        features  = [feature_names[i] for i in tree.tree_.feature]        

    def recurse(left, right, threshold, features, node, depth=0):
            offset = offset_unit*depth
            if (threshold[node] != -2):
                    print(offset+"if ( " + features[node] + " <= " + str(threshold[node]) + " ) {")
                    if left[node] != -1:
                            recurse (left, right, threshold, features,left[node],depth+1)
                    print(offset+"} else {")
                    if right[node] != -1:
                            recurse (left, right, threshold, features,right[node],depth+1)
                    print(offset+"}")
            else:
                    print(offset+"return " + str(value[node]))

    recurse(left, right, threshold, features, 0,0)

또한 클래스가 속한 클래스를 구분하거나 출력 값을 언급하여 더 많은 정보를 얻을 수 있습니다.

def print_decision_tree(tree, feature_names, offset_unit='    '):    
left      = tree.tree_.children_left
right     = tree.tree_.children_right
threshold = tree.tree_.threshold
value = tree.tree_.value
if feature_names is None:
    features  = ['f%d'%i for i in tree.tree_.feature]
else:
    features  = [feature_names[i] for i in tree.tree_.feature]        

def recurse(left, right, threshold, features, node, depth=0):
        offset = offset_unit*depth
        if (threshold[node] != -2):
                print(offset+"if ( " + features[node] + " <= " + str(threshold[node]) + " ) {")
                if left[node] != -1:
                        recurse (left, right, threshold, features,left[node],depth+1)
                print(offset+"} else {")
                if right[node] != -1:
                        recurse (left, right, threshold, features,right[node],depth+1)
                print(offset+"}")
        else:
                #print(offset,value[node]) 

                #To remove values from node
                temp=str(value[node])
                mid=len(temp)//2
                tempx=[]
                tempy=[]
                cnt=0
                for i in temp:
                    if cnt<=mid:
                        tempx.append(i)
                        cnt+=1
                    else:
                        tempy.append(i)
                        cnt+=1
                val_yes=[]
                val_no=[]
                res=[]
                for j in tempx:
                    if j=="[" or j=="]" or j=="." or j==" ":
                        res.append(j)
                    else:
                        val_no.append(j)
                for j in tempy:
                    if j=="[" or j=="]" or j=="." or j==" ":
                        res.append(j)
                    else:
                        val_yes.append(j)
                val_yes = int("".join(map(str, val_yes)))
                val_no = int("".join(map(str, val_no)))

                if val_yes>val_no:
                    print(offset,'\033[1m',"YES")
                    print('\033[0m')
                elif val_no>val_yes:
                    print(offset,'\033[1m',"NO")
                    print('\033[0m')
                else:
                    print(offset,'\033[1m',"Tie")
                    print('\033[0m')

recurse(left, right, threshold, features, 0,0)

다음은 sql에서 직접 사용할 수 있는 형태로 의사결정 규칙을 추출하여 노드별로 데이터를 그룹화할 수 있는 방법입니다. (이전 포스터의 접근 방식을 기반으로 함)

결과는 다음과 같습니다.CASE문에 할 수 절 sql 문 예 있 는 절 수 （ 할 :

SELECT COALESCE(*CASE WHEN <conditions> THEN > <NodeA>*, > *CASE WHEN <conditions> THEN <NodeB>*, > ....)NodeName,* > FROM <table or view>

import numpy as np

import pickle
feature_names=.............
features  = [feature_names[i] for i in range(len(feature_names))]
clf= pickle.loads(trained_model)
impurity=clf.tree_.impurity
importances = clf.feature_importances_
SqlOut=""

#global Conts
global ContsNode
global Path
#Conts=[]#
ContsNode=[]
Path=[]
global Results
Results=[]

def print_decision_tree(tree, feature_names, offset_unit=''    ''):    
    left      = tree.tree_.children_left
    right     = tree.tree_.children_right
    threshold = tree.tree_.threshold
    value = tree.tree_.value

    if feature_names is None:
        features  = [''f%d''%i for i in tree.tree_.feature]
    else:
        features  = [feature_names[i] for i in tree.tree_.feature]        

    def recurse(left, right, threshold, features, node, depth=0,ParentNode=0,IsElse=0):
        global Conts
        global ContsNode
        global Path
        global Results
        global LeftParents
        LeftParents=[]
        global RightParents
        RightParents=[]
        for i in range(len(left)): # This is just to tell you how to create a list.
            LeftParents.append(-1)
            RightParents.append(-1)
            ContsNode.append("")
            Path.append("")


        for i in range(len(left)): # i is node
            if (left[i]==-1 and right[i]==-1):      
                if LeftParents[i]>=0:
                    if Path[LeftParents[i]]>" ":
                        Path[i]=Path[LeftParents[i]]+" AND " +ContsNode[LeftParents[i]]                                 
                    else:
                        Path[i]=ContsNode[LeftParents[i]]                                   
                if RightParents[i]>=0:
                    if Path[RightParents[i]]>" ":
                        Path[i]=Path[RightParents[i]]+" AND not " +ContsNode[RightParents[i]]                                   
                    else:
                        Path[i]=" not " +ContsNode[RightParents[i]]                     
                Results.append(" case when  " +Path[i]+"  then ''" +"{:4d}".format(i)+ " "+"{:2.2f}".format(impurity[i])+" "+Path[i][0:180]+"''")

            else:       
                if LeftParents[i]>=0:
                    if Path[LeftParents[i]]>" ":
                        Path[i]=Path[LeftParents[i]]+" AND " +ContsNode[LeftParents[i]]                                 
                    else:
                        Path[i]=ContsNode[LeftParents[i]]                                   
                if RightParents[i]>=0:
                    if Path[RightParents[i]]>" ":
                        Path[i]=Path[RightParents[i]]+" AND not " +ContsNode[RightParents[i]]                                   
                    else:
                        Path[i]=" not "+ContsNode[RightParents[i]]                      
                if (left[i]!=-1):
                    LeftParents[left[i]]=i
                if (right[i]!=-1):
                    RightParents[right[i]]=i
                ContsNode[i]=   "( "+ features[i] + " <= " + str(threshold[i])   + " ) "

    recurse(left, right, threshold, features, 0,0,0,0)
print_decision_tree(clf,features)
SqlOut=""
for i in range(len(Results)): 
    SqlOut=SqlOut+Results[i]+ " end,"+chr(13)+chr(10)

결정 트리에서 SQL을 가져오도록 Zelazny7의 코드를 수정했습니다.

# SQL from decision tree

def get_lineage(tree, feature_names):
     left      = tree.tree_.children_left
     right     = tree.tree_.children_right
     threshold = tree.tree_.threshold
     features  = [feature_names[i] for i in tree.tree_.feature]
     le='<='               
     g ='>'
     # get ids of child nodes
     idx = np.argwhere(left == -1)[:,0]     

     def recurse(left, right, child, lineage=None):          
          if lineage is None:
               lineage = [child]
          if child in left:
               parent = np.where(left == child)[0].item()
               split = 'l'
          else:
               parent = np.where(right == child)[0].item()
               split = 'r'
          lineage.append((parent, split, threshold[parent], features[parent]))
          if parent == 0:
               lineage.reverse()
               return lineage
          else:
               return recurse(left, right, parent, lineage)
     print 'case '
     for j,child in enumerate(idx):
        clause=' when '
        for node in recurse(left, right, child):
            if len(str(node))<3:
                continue
            i=node
            if i[1]=='l':  sign=le 
            else: sign=g
            clause=clause+i[3]+sign+str(i[2])+' and '
        clause=clause[:-4]+' then '+str(j)
        print clause
     print 'else 99 end as clusters'

.txt 파일에 규칙 쓰기

from sklearn.tree import export_text

r = export_text(clf, feature_names=feature_names)

f = open("Rules_set.txt", "w")
f.write(r)

텍스트 파일

파일에서 규칙 읽는 중

file1 = open("Rules_set.txt","r")
data = file1.readlines()
    
dic = {}
first = None

for line in data:
    if( 'class' in line):
        #print(line.index('class'))
        rule = ' and '.join(list(dic.values()))
        rule = rule + ' ' + line[line.index('class'):]
        print(rule.strip())
        
    else:
        for char in line:
            if char.isalpha():
                index = line.index(char)
                if first == None:
                    first = index
                if first == index:
                    dic = {}
                dic[index] = f'({line[index:].strip()})'
                break

규칙.

분명히 오래 전에 누군가 이미 공식 scikit의 트리 내보내기 기능에 다음 기능을 추가하기로 결정했습니다(기본적으로 export_graphviz만 지원함).

def export_dict(tree, feature_names=None, max_depth=None) :
    """Export a decision tree in dict format.

그의 모든 약속은 다음과 같습니다.

https://github.com/scikit-learn/scikit-learn/blob/79bdc8f711d0af225ed6be9fdb708cea9f98a910/sklearn/tree/export.py

이 댓글에 무슨 일이 일어났는지 정확히 모르겠습니다.하지만 그 기능을 사용해 볼 수도 있습니다.

나는 이것이 스크킷의 좋은 사람들에게 진지한 문서 요청을 정당화한다고 생각합니다. 적절하게 문서화하는 법을 배웁니다.sklearn.tree.Tree API의 기본 :DecisionTreeClassifier 속으로공다니합으로 됩니다.tree_.

이렇게 sklearn.tree의 함수를 사용하면 됩니다.

from sklearn.tree import export_graphviz
    export_graphviz(tree,
                out_file = "tree.dot",
                feature_names = tree.columns) //or just ["petal length", "petal width"]

그런 다음 프로젝트 폴더에서 tree.dot 파일을 찾아 모든 콘텐츠를 복사하여 http://www.webgraphviz.com/ 에 붙여넣고 그래프를 생성합니다.

@paulkerfeld의 멋진 솔루션에 감사드립니다.해결책 , 그의솔루션위에, 의트버원모사든위람해을들그냥하는일, 련리전을▁use,▁on그냥,▁just위,▁of해,▁solution▁thoseized▁who를 사용하세요.tree.threshold,tree.children_left,tree.children_right,tree.feature그리고.tree.value잎에 분할이 없기 때문에 피쳐 이름과 자식이 없으므로 자리 표시자는tree.feature그리고.tree.children_***_tree.TREE_UNDEFINED그리고._tree.TREE_LEAF모든 분할에는 다음과 같은 방법으로 고유 인덱스가 할당됩니다.depth first search.
에 하십시오.tree.value이 모이양 좋은좋▁is.[n, 1, 1]

다음은 의 출력을 변환하여 의사결정 트리에서 파이썬 코드를 생성하는 함수입니다.export_text:

import string
from sklearn.tree import export_text

def export_py_code(tree, feature_names, max_depth=100, spacing=4):
    if spacing < 2:
        raise ValueError('spacing must be > 1')

    # Clean up feature names (for correctness)
    nums = string.digits
    alnums = string.ascii_letters + nums
    clean = lambda s: ''.join(c if c in alnums else '_' for c in s)
    features = [clean(x) for x in feature_names]
    features = ['_'+x if x[0] in nums else x for x in features if x]
    if len(set(features)) != len(feature_names):
        raise ValueError('invalid feature names')

    # First: export tree to text
    res = export_text(tree, feature_names=features, 
                        max_depth=max_depth,
                        decimals=6,
                        spacing=spacing-1)

    # Second: generate Python code from the text
    skip, dash = ' '*spacing, '-'*(spacing-1)
    code = 'def decision_tree({}):\n'.format(', '.join(features))
    for line in repr(tree).split('\n'):
        code += skip + "# " + line + '\n'
    for line in res.split('\n'):
        line = line.rstrip().replace('|',' ')
        if '<' in line or '>' in line:
            line, val = line.rsplit(maxsplit=1)
            line = line.replace(' ' + dash, 'if')
            line = '{} {:g}:'.format(line, float(val))
        else:
            line = line.replace(' {} class:'.format(dash), 'return')
        code += skip + line + '\n'

    return code

샘플 사용량:

res = export_py_code(tree, feature_names=names, spacing=4)
print (res)

샘플 출력:

def decision_tree(f1, f2, f3):
    # DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=3,
    #                        max_features=None, max_leaf_nodes=None,
    #                        min_impurity_decrease=0.0, min_impurity_split=None,
    #                        min_samples_leaf=1, min_samples_split=2,
    #                        min_weight_fraction_leaf=0.0, presort=False,
    #                        random_state=42, splitter='best')
    if f1 <= 12.5:
        if f2 <= 17.5:
            if f1 <= 10.5:
                return 2
            if f1 > 10.5:
                return 3
        if f2 > 17.5:
            if f2 <= 22.5:
                return 1
            if f2 > 22.5:
                return 1
    if f1 > 12.5:
        if f1 <= 17.5:
            if f3 <= 23.5:
                return 2
            if f3 > 23.5:
                return 3
        if f1 > 17.5:
            if f1 <= 25:
                return 1
            if f1 > 25:
                return 2

는 의예제다사생성됩다니용으로 생성됩니다.names = ['f'+str(j+1) for j in range(NUM_FEATURES)].

한 가지 편리한 기능은 공간을 줄이면서 더 작은 파일 크기를 생성할 수 있다는 것입니다. 설하기정으로 설정하세요.spacing=2.

이 답변을 통해 읽기 쉽고 효율적인 표현을 얻을 수 있습니다. https://stackoverflow.com/a/65939892/3746632

출력은 다음과 같습니다.X는 단일 인스턴스의 형상을 나타내는 1d 벡터입니다.

from numba import jit,njit
@njit
def predict(X):
    ret = 0
    if X[0] <= 0.5: # if w_pizza <= 0.5
        if X[1] <= 0.5: # if w_mexico <= 0.5
            if X[2] <= 0.5: # if w_reusable <= 0.5
                ret += 1
            else:  # if w_reusable > 0.5
                pass
        else:  # if w_mexico > 0.5
            ret += 1
    else:  # if w_pizza > 0.5
        pass
    if X[0] <= 0.5: # if w_pizza <= 0.5
        if X[1] <= 0.5: # if w_mexico <= 0.5
            if X[2] <= 0.5: # if w_reusable <= 0.5
                ret += 1
            else:  # if w_reusable > 0.5
                pass
        else:  # if w_mexico > 0.5
            pass
    else:  # if w_pizza > 0.5
        ret += 1
    if X[0] <= 0.5: # if w_pizza <= 0.5
        if X[1] <= 0.5: # if w_mexico <= 0.5
            if X[2] <= 0.5: # if w_reusable <= 0.5
                ret += 1
            else:  # if w_reusable > 0.5
                ret += 1
        else:  # if w_mexico > 0.5
            ret += 1
    else:  # if w_pizza > 0.5
        pass
    if X[0] <= 0.5: # if w_pizza <= 0.5
        if X[1] <= 0.5: # if w_mexico <= 0.5
            if X[2] <= 0.5: # if w_reusable <= 0.5
                ret += 1
            else:  # if w_reusable > 0.5
                ret += 1
        else:  # if w_mexico > 0.5
            pass
    else:  # if w_pizza > 0.5
        ret += 1
    if X[0] <= 0.5: # if w_pizza <= 0.5
        if X[1] <= 0.5: # if w_mexico <= 0.5
            if X[2] <= 0.5: # if w_reusable <= 0.5
                ret += 1
            else:  # if w_reusable > 0.5
                pass
        else:  # if w_mexico > 0.5
            pass
    else:  # if w_pizza > 0.5
        pass
    if X[0] <= 0.5: # if w_pizza <= 0.5
        if X[1] <= 0.5: # if w_mexico <= 0.5
            if X[2] <= 0.5: # if w_reusable <= 0.5
                ret += 1
            else:  # if w_reusable > 0.5
                pass
        else:  # if w_mexico > 0.5
            ret += 1
    else:  # if w_pizza > 0.5
        ret += 1
    if X[0] <= 0.5: # if w_pizza <= 0.5
        if X[1] <= 0.5: # if w_mexico <= 0.5
            if X[2] <= 0.5: # if w_reusable <= 0.5
                ret += 1
            else:  # if w_reusable > 0.5
                pass
        else:  # if w_mexico > 0.5
            pass
    else:  # if w_pizza > 0.5
        ret += 1
    if X[0] <= 0.5: # if w_pizza <= 0.5
        if X[1] <= 0.5: # if w_mexico <= 0.5
            if X[2] <= 0.5: # if w_reusable <= 0.5
                ret += 1
            else:  # if w_reusable > 0.5
                pass
        else:  # if w_mexico > 0.5
            pass
    else:  # if w_pizza > 0.5
        pass
    if X[0] <= 0.5: # if w_pizza <= 0.5
        if X[1] <= 0.5: # if w_mexico <= 0.5
            if X[2] <= 0.5: # if w_reusable <= 0.5
                ret += 1
            else:  # if w_reusable > 0.5
                pass
        else:  # if w_mexico > 0.5
            pass
    else:  # if w_pizza > 0.5
        pass
    if X[0] <= 0.5: # if w_pizza <= 0.5
        if X[1] <= 0.5: # if w_mexico <= 0.5
            if X[2] <= 0.5: # if w_reusable <= 0.5
                ret += 1
            else:  # if w_reusable > 0.5
                pass
        else:  # if w_mexico > 0.5
            pass
    else:  # if w_pizza > 0.5
        pass
    return ret/10

저는 여기서 사용되는 방법을 찾았습니다: https://mljar.com/blog/extract-rules-decision-tree/ 은 꽤 좋고, 사람이 읽을 수 있는 규칙 세트를 직접 생성할 수 있으며, 이를 통해 규칙도 필터링할 수 있습니다.

언급URL : https://stackoverflow.com/questions/20224526/how-to-extract-the-decision-rules-from-scikit-learn-decision-tree

'programing' 카테고리의 다른 글

()에 대한 이해 및 파티션 분할 시도 (0)	2023.06.10
C의 함수에서 로컬 변수 반환 (0)	2023.06.10
C 전처리기 교체 (0)	2023.06.10
Oracle varchar2의 문자열 16진수 값을 표시하시겠습니까? (0)	2023.06.10
입력 텍스트 상자에서 값 가져오기 (0)	2023.05.31

현재글스크킷 학습 의사 결정 트리에서 의사 결정 규칙을 추출하는 방법은 무엇입니까?

각종 프로그래밍 정보를 다루는 블로그입니다.

angularjs, MySQL, Powershell, Android, ASP.NET, reactjs, bash, JSON, Ajax, mongoDB, Excel, WordPress, oracle, Spring-Boot, sql-server, Python, jquery, C, MariaDB, Git,

Today :
Yesterday :

muds

스크킷 학습 의사 결정 트리에서 의사 결정 규칙을 추출하는 방법은 무엇입니까?

스크킷 학습 의사 결정 트리에서 의사 결정 규칙을 추출하는 방법은 무엇입니까?

예

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

« 2025/02 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

스크킷 학습 의사 결정 트리에서 의사 결정 규칙을 추출하는 방법은 무엇입니까?

스크킷 학습 의사 결정 트리에서 의사 결정 규칙을 추출하는 방법은 무엇입니까?

예

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바