[go: up one dir, main page]

CN120198232B - Intelligent generation method for adjustment report of bad financial assets - Google Patents

Intelligent generation method for adjustment report of bad financial assets

Info

Publication number
CN120198232B
CN120198232B CN202510682535.XA CN202510682535A CN120198232B CN 120198232 B CN120198232 B CN 120198232B CN 202510682535 A CN202510682535 A CN 202510682535A CN 120198232 B CN120198232 B CN 120198232B
Authority
CN
China
Prior art keywords
risk
model
data
report
financial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202510682535.XA
Other languages
Chinese (zh)
Other versions
CN120198232A (en
Inventor
汝晴
李凯绅
沈威
张菊
高杰
张冠
胡康帅
温荣华
杨娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Baichang Technology Group Co ltd
Original Assignee
Shanghai Baichang Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Baichang Technology Group Co ltd filed Critical Shanghai Baichang Technology Group Co ltd
Priority to CN202510682535.XA priority Critical patent/CN120198232B/en
Publication of CN120198232A publication Critical patent/CN120198232A/en
Application granted granted Critical
Publication of CN120198232B publication Critical patent/CN120198232B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/041Abduction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Accounting & Taxation (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Primary Health Care (AREA)
  • Game Theory and Decision Science (AREA)

Abstract

本发明提供一种对金融不良资产进行尽调报告智能生成的方法,涉及管理系统领域,深度融合法律文书、财务报表等多源异构数据,通过多模态特征提取、语义对齐及知识图谱构建技术,形成对目标资产的全面上下文感知模型;运用遗传算法等演化计算方法,自动挖掘与资产特性深度耦合的潜在及非显性风险因子组合,并构建自适应风险评估网络对综合风险水平进行动态的量化评估与预测;采用可解释性人工智能模型对风险评估结果进行归因分析,清晰揭示关键影响路径与核心数据证据;依据用户可动态调整的报告逻辑框架和叙事模板,自动化输出包含深度分析、风险预警、多元化处置建议及合规性审查要点的金融不良资产尽职调查报告。

The present invention provides a method for intelligently generating due diligence reports on financial non-performing assets, which relates to the field of management systems. It deeply integrates multi-source heterogeneous data such as legal documents and financial statements, and forms a comprehensive context-aware model of the target assets through multimodal feature extraction, semantic alignment and knowledge graph construction technology; uses evolutionary computing methods such as genetic algorithms to automatically mine potential and non-explicit risk factor combinations that are deeply coupled with asset characteristics, and constructs an adaptive risk assessment network to dynamically quantitatively assess and predict the comprehensive risk level; uses an explainable artificial intelligence model to perform attribution analysis on the risk assessment results, clearly revealing key impact paths and core data evidence; and automatically outputs a due diligence report on financial non-performing assets that includes in-depth analysis, risk warnings, diversified disposal suggestions and compliance review points based on a report logic framework and narrative template that can be dynamically adjusted by users.

Description

Intelligent generation method for adjustment report of bad financial assets
Technical Field
The invention relates to the field of management systems, in particular to a method for intelligently generating an adjustment report for bad financial assets.
Background
The traditional debug report mostly describes and analyzes the static state of the historical condition and the current time point of the asset, is difficult to dynamically track the real-time change of key factors such as the operation condition of debtors, mortgage value, legal environment, market environment and the like, and further lacks the capability of effectively predicting and early warning the risk evolution trend.
Disclosure of Invention
Problems to be solved are
Aiming at the defects in the prior art, the invention provides a method for intelligently generating the adjustment report of the bad financial assets, which solves the problems in the prior art.
Technical proposal
In order to achieve the purpose, the intelligent generation method of the adjustment report for the bad financial assets is achieved through the following scheme that the intelligent generation method of the adjustment report for the bad financial assets comprises the following steps:
Sp1, multi-modal data depth fusion and context aware modeling:
collecting structured, semi-structured and unstructured data related to a target financial bad asset to obtain heterogeneous data;
mapping the heterogeneous data to a unified semantic space, identifying and establishing entities and attributes related to the financial bad assets and complex association relations among the entities and the attributes, and forming a context perception model for the financial bad assets;
Sp2, risk factor mining and adaptive evaluation based on evolution calculation:
Sp2.1, initializing a risk factor candidate set, namely, based on the context awareness model, performing iterative optimization in a preset risk dimension space by utilizing a genetic algorithm evolution calculation method, and mining potential and non-explicit risk factor combinations coupled with the poor financial asset characteristics;
Sp2.2, constructing a risk assessment network, wherein the risk assessment network dynamically adjusts the weight and interaction relation of each risk factor according to newly input data and historical assessment feedback, and quantitatively assesses the comprehensive risk level of the financial bad asset by combining an integrated learning strategy, and outputs a risk image and a confidence interval;
Sp3, insight generation and report output based on an interpretability model:
sp3.1, performing attribution analysis on the output result of the risk assessment network by adopting an interpretable intelligent model, and identifying a key influence path and core data evidence which lead to a specific risk assessment conclusion;
Sp3.2, according to a preset report logic framework and a narrative template which can be dynamically adjusted by a user, generating key information in the context awareness model by using natural language, and outputting a financial bad asset adjustment report.
Preferably, in the multi-modal data depth fusion and context awareness modeling, the method further comprises the steps of analyzing long-distance dependency relations and complex clause structures in legal texts by using a pre-training language model with enhanced attention mechanisms, and encoding extracted legal text key elements and confidence scores thereof into nodes and weighted edges in a knowledge graph.
Preferably, in the risk factor mining based on evolution calculation, the method further includes:
quantifying the interpretation ability of the decision rule generated by the risk factor by calculating its minimum descriptive length;
Measuring the prediction accuracy of the historical default event by evaluating the F1 score of the risk factor combination on the back-measured data set;
And evaluating the association degree by calculating vector space cosine similarity between the risk factor combination and a preset model in a typical bad asset risk model library checked by a field expert.
Preferably, the risk assessment network adopts a reinforcement learning mechanism, and the reinforcement learning mechanism further comprises the steps of constructing a depth certainty strategy gradient agent, wherein the state space of the agent represents the risk factors and the assessment results of the current asset, the action space corresponds to the adjustment strategy of specific risk factor weights or activation functions in the risk assessment network, and the confirmation, correction or rejection actions of the user on the risk assessment are quantized into scalar rewarding signals for guiding the strategy learning of the agent.
Preferably, the insight generation of the interpretability model further comprises generating a visual map of risk conduction paths, wherein the generation of the visual map of risk conduction paths further comprises the step of identifying and quantifying influence intensities and conduction probabilities among different risk entities and risk factors on a knowledge map of the context awareness model by adopting a graph neural network reasoning algorithm based on attention weighting to form the conduction paths.
Preferably, in the report output, the natural language generation further comprises using a conditional text generation model, taking a preset portrait of a target audience as a conditional input, combining with a structural semantic representation extracted from the insight of the interpretability model, dynamically selecting a narrative template, adjusting the professional level of the term, and controlling the detailed level of the discussion to generate a customized report text meeting the requirements of the specific audience.
Preferably, the method further comprises a continuous learning and model iteration module, the continuous learning and model iteration module further comprising:
the data and concept drift detection unit is used for monitoring the statistical characteristics of the input data stream and the change of a user feedback mode in real time, and automatically triggering a model updating flow when significant drift is detected;
and the differential knowledge map updating unit is used for merging the newly added or changed entities, relations and confidence degrees thereof into the existing knowledge map in an incremental mode, and applying the knowledge learned from the historical data to iterative upgrading of the new model by utilizing transfer learning.
Preferably, a system for intelligent generation of an adjustment report for a financial bad asset, comprising a processor and a memory coupled to the processor, the memory having stored therein computer program instructions that are executed by the processor, the system further comprising:
Multimodal data depth fusion and context aware modeling engine:
Collecting data related to a target financial bad asset from a heterogeneous data source;
constructing a context awareness model of the financial bad asset;
sp2, risk factor mining and self-adaptive evaluation engine based on evolution calculation:
performing iterative optimization in a preset risk dimension space;
constructing and operating a risk assessment network;
sp3, insight generation and report output engine based on an interpretive model:
Performing attribution analysis on the risk assessment result by adopting an interpretable intelligent model;
And outputting a financial bad asset adjustment report.
Preferably, the multi-modal data depth fusion and context aware modeling engine pre-training language model processing unit further comprises:
the complex long sentence segmentation and dependency relationship analysis module is used for accurately identifying master-slave structure and limiting conditions in contract clauses;
And the semantic role labeling module is based on field knowledge enhancement and labels specific participant roles and core legal behaviors in financial transactions.
Preferably, the risk factor mining and adaptive evaluation engine based on evolution calculation further comprises:
The feedback-driven model parameter adjustment module, the feedback-driven model parameter adjustment module further comprises:
The user feedback real-time capturing and structuring processing interface is used for receiving and analyzing labeling information of the user on the risk factors and the evaluation results;
The incremental model training and version control unit is embedded with an evolution algorithm and supports online fine adjustment of a population initialization strategy of the evolution algorithm and local connection weight of the evaluation network;
The offline batch retraining scheduler is used to trigger global optimization of the entire model hierarchy after accumulating enough new data or feedback.
Advantageous effects
The invention provides a method for intelligently generating an adjustment report for a bad financial asset. The beneficial effects are as follows:
According to the invention, through deep fusion of multi-modal data, construction of a panoramic context knowledge graph, and utilization of evolution calculation to mine hidden risk factors, and combination of a self-adaptive advanced evaluation network and an interpretable AI insight, the complex correlation and potential risks can be rapidly extracted from heterogeneous data, the efficiency of information collection and preliminary analysis is improved to a brand new level, the known risks can be accurately quantified, the non-explicit and combined risks can be actively found and evaluated, and the sensitivity and accuracy of risk identification are greatly improved.
The invention has continuous learning and self-evolution capability, can adapt to dynamically-changed market and risk environments, realizes digital precipitation and intelligent iteration of organization knowledge experience, and ensures the effectiveness and value of long-term application.
Drawings
FIG. 1 is a diagram of the system of the present invention;
fig. 2 is a flow chart of the system of the present invention.
Detailed Description
The aspects of embodiments of the present invention will be described more fully hereinafter with reference to the accompanying drawings, in which it is shown, however, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
Specific examples:
As shown in fig. 1 to 2, a method for intelligently generating an adjustment report for a bad financial asset, the method includes obtaining data related to the bad financial asset and generating a report structure, the method further includes the steps of:
sp1, multi-modal data depth fusion and context perception modeling establish a foundation for comprehensively understanding target financial bad assets, and the whole process of finally forming a context perception model from data acquisition, preprocessing, multi-modal feature extraction and alignment is involved.
Data acquisition and pretreatment:
Sources and types of data include, but are not limited to, (1) legal documents including borrowing contracts, vouchers, mortgage agreements, litigation/arbitration documents (prosecution, judgment, mediation), bankruptcy reformats, etc., typically in PDF, word or scanned picture format, (2) financial reports including liability and associated liability lists, profit lists, cash flow lists, owner equity change lists and their notes, typically in Excel, PDF or picture format, (3) market transaction data including secondary market transaction prices, volume of transactions, related macroscopic economic indicators, industry indexes, interest rates, exchange rates, etc., typically structured time series data, and (4) public opinion information including physical news portals, financial media, social platforms, industry forums, etc., liability people, guarantying, associated parties, collateral physical news, infringement records, prosecution information, business anomalies, etc., typically unstructured text or semi-structured data, (5) state descriptions including equipment, location of investigation, geographical location, geographical report, geographical map, etc.
Preprocessing the data processing scheme, namely legal documents and financial reports (text/image types):
OCR and format analysis, namely, performing character recognition on a document in a scanned or picture format by adopting a high-precision OCR engine (comprising TESSERACT-OCR and optimizing by combining a deep learning model CRNN). Document structures, including titles, paragraphs, forms, stamps, etc., are identified using Layout analysis, including models based on FASTER RCNN or Layout LM.
Form extraction, namely extracting structured form data from forms in financial reports or contracts by adopting OpenCV image processing in combination with heuristic rules or a special form identification model (comprising TableNet, tab transducer).
The text cleaning comprises the steps of removing header footers, watermarks, irrelevant symbols and messy codes, performing complex conversion and full-angle half-angle unification, and identifying and processing blank filling items and handwriting supplementing parts in contracts.
And the primary element extraction is to primarily extract core elements such as contract numbers, principal names, amounts, dates, and guarantee names based on regular expressions and keyword libraries, and the core elements are used as auxiliary information or verification basis for subsequent model processing.
Financial statement (structured data):
subject standardization mapping financial subjects revealed by different accounting criteria, different enterprises, to a unified standardized subject hierarchy (including based on XBRL classification criteria or custom standard subject tables).
Checking report audit relation (including asset=liability+owner interests), identifying and processing outliers (including using a box graph method or a Z-score method), missing values (including using mean/median filling, regression interpolation or multiple interpolation).
Financial index calculation, namely automatically calculating key financial indexes such as repayment capacity ratio (including flowing ratio, quick action ratio and asset liability ratio), profitability ratio (including sales net interest rate and net asset income rate), operational capacity ratio (including accounts receivable turnover rate and inventory turnover rate) and the like.
Market trade data, API access and data parsing, namely acquiring data through an API interface (comprising Peng Bo, road penetration and universal) or a web crawler (required to adhere to robots. Txt protocol). And analyzing data in JSON, XML, CSV or other formats.
Data cleaning, namely processing missing points (linear interpolation and spline interpolation) and abnormal fluctuation (moving average filtering and exponential smoothing) in time series data.
Time alignment and frequency conversion data of different sources and different frequencies are aligned to a unified time axis (including the date and Zhou Du).
Public opinion information, namely, extracting the directed crawlers and contents, namely, developing the directed crawlers by utilizing Scrapy frames and the like according to the keywords of debtors, relatives and the like, and extracting news texts, release time, sources and the like.
Data deduplication, removing duplicate or highly similar public opinion information based on text similarity (including SimHash, minHashLSH).
Initial emotion tendencies judgment, namely performing preliminary positive and negative emotion scoring on the public opinion texts by using a dictionary-based (comprising a dictionary of emotion of a learning net Hownet) or a simple classification model (comprising naive Bayes).
And (3) carrying out image data processing, namely carrying out unified format conversion, size normalization and image enhancement (comprising histogram equalization and denoising) on photos and videos.
The text description is structured, and key attributes (including area, position and purpose of the property, model number of equipment, year of purchase and depreciation) are extracted from the text description in the assessment report.
The multi-modal feature extraction and alignment is that data of different sources and different modalities are mapped to a unified shared semantic space which can capture cross-modal semantic relations.
Text feature extraction:
Model construction, namely selecting a transducer model for pre-training and fine tuning in the financial or legal field, wherein the transducer model comprises FinBERT (pre-training based on financial news and research report), lawBERT/LegalBERT (pre-training based on legal documents and cases), or a universal model BERT, roBERTa, ERNIE and the like, and performing field adaptive pre-training (Domain-ADAPTIVE PRETRAINING) and downstream task fine tuning on a specific bad asset-related corpus (comprising contracts, litigation documents, financial notes, market analysis reports and the like). Downstream tasks may include named entity recognition, relationship extraction, text classification (including risk type judgment), semantic similarity calculation. And during fine tuning, performing parameter optimization by adopting a cross entropy loss function.
The model application is that a text segment (including contract clauses, financial notes and news abstracts) after preprocessing is input, and the model outputs semantic vector representations (token-level embeddings, sentence/clause-level embeddings and document-level embeddings) with fixed dimensions (including 768-dimension or 1024-dimension). For long text, the entire representation may be obtained using post-segmentation processing pooling (including mean/max pooling) or hierarchical transfomers (including HiBERT).
And (3) extracting structural data characteristics:
The processing scheme is that numerical financial indexes, market data and the like can be directly used as feature vectors after washing and normalization/standardization (including Min-Max Scaling, Z-score Standardization). For class-type features (including industry classification, regional classification), one-Hot Encoding (One-Hot Encoding) may be employed or mapped into a low-dimensional dense embedded vector (Embedding-Layer, available through end-to-end learning).
Extracting image features:
Model construction deep Convolutional Neural Networks (CNNs) pre-trained on large image datasets (including ImageNet) are chosen, including ResNet-50/101, efficiency-Net (B0-B7 series) or Vision Transformer (ViT). Depending on the characteristics of the bad asset mortgage image (including real estate appearance, device details, ticket scan), fine tuning can be performed on domain-specific image data to better capture visual features related to asset assessment.
The model application is that the CNN/ViT model outputs a high-dimensional characteristic vector (comprising ResNet outputs 2048 dimensions and efficiency-Net is different according to different dimensions of the series) by inputting the preprocessed image.
Constructing a multi-mode alignment model:
Based on the joint embedded model, a multi-input neural network architecture is designed, with the aim of mapping samples of different modalities but semantically related to adjacent locations in a shared semantic space. A clause description in a legal document should have a similar vector representation as the corresponding mortgage photograph. During training, a contrast loss function or a triplet loss function is adopted. The contrast loss aims to pull the distance of the positive sample pair (including the "contract clause describing property a" vector and the "photo of property a" vector) closer, and the distance of the negative sample pair farther. The triplet loss then considers the anchor point, positive samples and negative samples simultaneously.
Collaborative training or cross-modality generation based methods of training a model to generate a pseudo-representation of image features from textual descriptions and requiring them to be similar to real image features and vice versa.
Training data construction, namely constructing a large number of cross-modal alignment samples, including legal document fragments, associated financial data fragments, mortgage description texts, mortgage pictures and market news, corresponding to the financial index changes of the company.
The model application is that the original data from different modes are finally converted into vector representations in the same high-dimensional semantic space through respective feature extractors and alignment models. These vectors may be used directly for downstream tasks including similarity calculation, clustering, risk assessment, etc.
And (3) knowledge graph construction, namely automatically identifying entities and attributes related to the financial bad assets and complex association relations among the entities and the attributes from the multi-mode data to form a structured knowledge network.
The entity recognition model is constructed by adopting a sequence labeling model, adopting a main stream architecture as a pre-training language model and adopting BERT+a linear layer+a conditional random field. The CRF layer can learn the constraint relation among the labels, and accuracy of entity boundary identification is improved.
Data processing and application define entity types for the undesirable asset domain including liability person (DEBTOR), creditor (CREDITOR), guarantor (GUARANTOR), mortgage (COLLATERAL), contract (CONTRACT), court (COURT), AMOUNT (amounto), DATE (DATE), RISK EVENT (risk_event), etc.
Training data, namely corpus requiring a large number of labels (including BIO label format). The labeling cost can be reduced by active learning and semi-supervised learning.
The model application inputs text (including contracts, decisions), outputs the identified entities and their types and locations in the text. 100 ten thousand yuan for three-way four-prune borrowing and five-principal providing associated responsibility guarantee are identified from Zhang three (DEBTOR), four-prune (CREDITOR), 100 ten thousand yuan (AMOUNT) and five principal (GUARANTOR).
And (3) constructing a relation extraction model, namely performing NER (network element) firstly and then performing relation classification on the identified entity based on a pipeline method. The relational classification model may be BiLSTM based on the attention mechanism, a Path-based graph-convolution network (Path-based GCN, using inter-entity dependent syntactic paths), or a classifier based on a pre-trained language model (including adding classification headers to the [ CLS ] representation or the entity pair representation).
The Joint-NER-and-RE based method designs a model to complete entity identification and relation extraction tasks at the same time, and can better utilize the dependency information between the two. A transducer model based on multitasking learning.
Relationship types define the relationship types between predefined entities, including "borrow (borrows _from)" "guarantee (guarantees _for)" "mortgage (mortgages _to)" "litigation involving (involved _in_ lawsuit)", and the like.
Training data, namely, a triplet of entity pairs and relations thereof need to be marked.
The model application inputs text containing the identified entities and entity pairs, the type of relationship between the outputs and their confidence. (Zhang three, borrows _from, lifour), (Wang five, guarantees _for, zhang three).
Event-Extraction (EE):
model construction-identifying specific events that occur in text and their participants (arguments). The model typically includes two phases, event trigger word recognition and argument character labeling. Methods based on sequence labeling or machine-readable understanding may be employed.
Data processing and application defining event types (including "signing contract", "promoting litigation", "declaring bankruptcy") and argument roles for each event type (including "signing party", "contract object", "signing date" for signing contract event).
Entity alignment and disambiguation schemes solve the unified problem of different data sources, the same entity (including "company A" and "company A, inc.) of different expressions, and the problem that the same name refers to different entities. The method comprises a physical link method based on character string similarity, physical attribute similarity, network structure similarity (including using link prediction in a knowledge graph) or a pre-training model.
Knowledge storage and reasoning, wherein the graph database selects the graph database of Neo4j to store the constructed knowledge graph (entity is used as a node, relationship is used as an edge, and attribute is used as the characteristic of the node/edge).
Reasoning application-complex queries are performed using the graph query language CypherQL. Rule reasoning based on atlases uses SWRL or graph-embedded link prediction, path discovery, etc.
And constructing and applying the context awareness model, namely constructing and applying the constructed knowledge graph as a core framework, and taking the multi-modal feature vectors (including text semantic vectors, image feature vectors and structured data features) extracted from each step in Sp1 as attributes of corresponding entity nodes or relationship edges or associating the attributes with the corresponding entity nodes or relationship edges. The debtor node associates the vector of the analysis result of the financial report, the latest public opinion emotion score, the semantic vector of the key terms of the related legal document and the like besides the basic information. The model provides a comprehensive, multi-dimensional, structured view of bad assets and their related parties, related events, potential risks.
Information retrieval and aggregation-all relevant information for a particular asset, regardless of its original modality and source, is quickly retrieved. Revealing hidden association relations, including common guarantee circles, indirect control relations and complex creditor liability chains.
Risk conduction analysis, namely simulating the conduction path and potential influence range of a specific risk event (including a certain core enterprise violation) in a knowledge-graph network.
Feature engineering-providing high quality, contextual information rich input features (including graph embedded representations, path features, neighborhood aggregation features, etc.) to a downstream risk assessment model.
For unstructured text data such as legal documents, the optimization options of key elements are extracted:
The pre-training language model with enhanced attention mechanism is applied to model selection and construction, namely a transducer model for processing long-sequence text by Reformer is adopted, and long-distance dependency relations (including definition before a certain term in a contract refers to tens of pages or comprehensive identification of multiple pieces of evidence in a judgment) in legal text are captured through a sparse attention mechanism.
And analyzing the structure of complex clauses, namely, the models can better understand the structures of complex parallel sentences, complex sentences, conditional clauses, limiting clauses and the like by learning a large amount of legal texts in the pre-training and fine tuning processes, so that the core right obligation relation, preconditions, exclusionary liabilities and the like are accurately identified.
Implicit vouching responsibility and or liability identification by combining semantic understanding with knowledge-graph reasoning. The model identifies a typical sentence pattern of 'if party B fails to pay the primary liabilities by date, party C agrees to bear liability for clearing', and the correlation (including parent company and actual controller) of party C and party B in the knowledge graph is combined to infer party C or liability. For more concealed terms (including "first party has the right to add a guarantee under specific market conditions"), a model is required to have stronger context understanding and logic inference capabilities, a specific subtask model (including a model based on reading understanding, answer the question of "whether third party has a guarantee responsibility" or not) or rule-based post-processing is required.
Key elements and knowledge graph coding of confidence scores:
Confidence score calculation the model will typically output a probability value or softmax score when identifying an entity, relationship or extracted element. The score may be calibrated (including PLATT SCALING or Isotonic Regression) to be used as a confidence level.
And constructing a weighted edge, namely taking the extracted legal text key elements (including 'guarantee amount', 'guarantee range', 'litigation case' in the contract) as node attributes or independent nodes in the knowledge graph. Its association with a core entity (including debtor, contract) the weight of which can be determined comprehensively from:
confidence score, the accuracy of extraction.
The importance of the element in the document is calculated by TF-IDF, text Rank, or model attention weights. The prior impact weight of different types of elements on risk assessment is preset (including the 'infinite liability' weight being higher than the 'general assurance' weight). And when the subsequent XAI model judges that the factor has larger contribution to the risk assessment result, the weight of the factor can be reversely increased. The weighted edges can more accurately reflect the contribution degree of different information fragments to overall risk judgment in a subsequent graph algorithm (including community discovery, centrality calculation and risk conduction analysis) and a risk assessment model.
Sp2, risk factor mining and self-adaptive evaluation based on evolution calculation automatically discovers potential risk factor combinations which are related to bad asset risks and are difficult to visually perceive from a high-dimensional complex feature space, and builds a risk evaluation model capable of dynamically adapting to new conditions.
Sp2.1, initializing a risk factor candidate set:
the evolution calculation method-genetic algorithm is detailed in that in a preset risk dimension space (comprising credit risk, market risk, operation risk, legal risk and other dimensions, each dimension comprises a plurality of candidate atomic risk factors), iterative optimization is carried out, and potential and non-dominant risk factor combinations coupled with poor financial asset characteristics are automatically mined.
The risk factor sources are the output of the context awareness model, node attributes (including debtor financial ratio, public opinion score) in the knowledge graph, the presence or absence or weight of edges (including whether specific guarantee relationships and litigation relationships exist), and certain dimensions of the graph embedding vector. Multimodal features, namely key dimension of text semantic features and clustering result of image features. Expert definition feature library, namely known important risk points predefined by financial field experts.
GA construction details chromosomal coding (individual representation) each individual (chromosome) represents a candidate risk factor combination.
A group of individuals is randomly generated as an initial population. Some known effective risk factors or combinations may also be incorporated with domain knowledge as part of the initial population (heuristic initialization) to accelerate convergence. The population size includes 50-200 individuals.
The selection operator selects the excellent individual to enter the next generation according to the fitness value of the individual.
The crossover operator simulates gene recombination in biological evolution, and the selected parent individuals are operated with certain crossover probability (0.6-0.9) to generate new offspring individuals.
The mutation operator randomly changes certain gene sites of the sub-generation individuals according to a certain mutation probability (0.01-0.1) so as to maintain population diversity and avoid falling into local optimum.
And (3) the iteration stopping condition is that the preset maximum evolution algebra and the fitness function value are reached, the continuous multi-generation is not obviously improved, and the solution meeting the specific condition is found. The output of the GA is a set (or an optimal) of risk factor combinations that will be used to build subsequent risk assessment networks, or directly form interpretable risk rules.
The fitness function of the evolution calculation method further comprises the step that the fitness function can balance prediction accuracy, interpretation, conciseness, robustness and field correlation of the risk factor combination. The interpretation ability of the GA-selected risk factor combinations to a set of decision rules in the form of IF-THEN is quantified by calculating the minimum descriptive length of the decision rules generated by the risk factors. Including causal risk factor combinations { financial index a <0.5, legal risk b=true }, the rule "IF financial index a <0.5AND legal risk b= TrueTHEN high risk" is generated. The coding length of the model itself (L (H)) and the coding length of the data given the model D (L (d|h)). For decision rules, L (H) is quantized to the number of rules, the number of conditions in each rule, etc. L (d|h) is the code length required to misclassify the samples under this rule. The goal is to minimize L (H) +l (d|h), H being the meaning of the coding length.
The smaller the MDL value is, the more concise the rule formed by combining the risk factors is, the better the data can be fitted, and the better the interpretation is. The fitness value may be set to the inverse of the MDL or some constant minus the MDL. The prediction accuracy of the combination of risk factors is measured by evaluating their F1 scores for historical violations on the return dataset:
A data set is prepared containing historical bad asset cases, each containing the values of various risk factors of the GA candidates (values at some point in time prior to the event) and the final real results (including whether violations, degree of loss, recovery level, etc.). The data set needs to be divided into a training set, a validation set and a test set. For each combination of risk factors generated by the GA, a simple classification model (including logistic regression, support vector machine, decision tree) is trained on the training set or the scoring card model is built directly using these factors, using these factors as features. Historical breach events are then predicted on the validation/test set.
The F1 score is the harmonic mean of the Precision and Recall, and can be well balanced, and is particularly suitable for the problem of default prediction of class imbalance.
F1 score calculation formula:
;
the F1 score is used to evaluate the accuracy of the classification model constructed based on the specific risk factor combination to predict adverse events on the historical back-measured dataset. It integrates the precision and recall of the model.
F1 is F1 fraction value, the value range is between 0 and 1, and the higher the value is, the better the prediction performance of the model is.
Precision) the model predicts the proportion of the positive case in the sample of the positive case, and the actual case is also the proportion of the positive case. The calculation formula is as follows:
;
recall (Recall or Recall) is the proportion of samples that were actually positive and were successfully predicted by the model as positive. The calculation formula is as follows:
;
TP: true positives, actually positive examples and the number of positive examples predicted by the model, FP: false positives (type I errors), actually negative examples but the number of positive examples predicted by the model, FN: false negatives (type II errors), actually positive examples but the number of negative examples predicted by the model.
The higher the F1 score, the more predictive the risk factor combination. Evaluating the association degree by calculating vector space cosine similarity between the risk factor combination and the patterns in the preset typical bad asset risk pattern library checked by the field expert:
The data processing is summarized empirically by financial domain experts (including senior credit approvers, risk managers) or by generalizing typical risk patterns from historical cases (including "over-expanded risk", "associated warranty chain risk", "outdated risk", etc.). Each pattern may be described by a set of key risk factors and their typical manifestations. Each expert-defined risk pattern and GA-generated risk factor combination is represented as a vector. Vectorization involves the set of fruit atom risk factors being fixed, each combination/pattern can be represented as a high-dimensional sparse vector, where the corresponding factor exists in a dimension of 1 or its weight, and in the absence of 0.
Including a risk factor combination that is highly similar to a typical risk pattern approved by an expert(s) is considered to have good field relevance and interpretability. The fitness function may consider the similarity to the most similar pattern, or a weighted average of the similarities to multiple patterns. The final fitness function is usually a weighted combination of the above indicators, and the weights are adjusted according to the service requirements. The comprehensive fitness function formula comprises the following genetic algorithm:
;
for evaluating the goodness of each risk factor combination in the genetic algorithm. Has good interpretation, high prediction accuracy and correlation with domain expert knowledge.
Comprehensive fitness value of individuals (risk factor combinations). The higher the value, the better the risk factor combination.,,Respectively the interpretation, the prediction accuracy and the domain association degree. The weights can be adjusted according to business requirements to focus on different optimization objectives. If prediction accuracy is more emphasized, thenThe weight of (c) may be set higher. These weights are typically valued at 0-1 and their sum is 1.The value of the minimum description length. The simplicity and interpretation ability of the decision rules generated by the current risk factor combination are quantified. The smaller the MDL value, the more concise the rule and the more interpretative. Thus, use is made in fitness functionsThe smaller the MDL value is made, the greater the contribution of the term.F1 fraction. The index of prediction accuracy of the prediction model constructed based on the current risk factor combination is particularly suitable for the data set with unbalanced category (mainly on default prediction). It is the harmonic mean of the precision and recall. The higher the F1 score, the better the prediction accuracy.Maximum cosine similarity. The method is used for measuring the maximum vector space cosine similarity between the current risk factor combination and each mode in a preset typical bad asset risk mode library checked by a field expert. The higher the value, the more relevant the current risk factor combination to expert-approved risk patterns, the better the domain correlation.
Sp2.2, constructing a risk assessment network, namely constructing a network model which can quantitatively assess the comprehensive risk level of the bad financial assets and can be dynamically adjusted according to the risk factor combination (or other more comprehensive feature sets) mined by the GA.
Model architecture model selection and construction:
Multilayer perceptron:
input layer GA excavated risk factor (vector after coding/embedding of type factor) and other important supplementary features (including macro economic index).
Hidden layer 1 to multiple fully connected layers, the number of neurons per layer is determined by the complexity of the problem and the amount of data (including 32,64,128, etc.). The activation function is selected from the group consisting of leak ReLU, tanh, etc. To prevent overfitting, dropout layers or L1/L2 regularization may be added.
The output layer is regression task (including predicting loss rate and recovery rate), which is a single neuron, linear activation function, and the loss function is mean square error and average absolute error.
Classification tasks (including predicting risk level: low/medium/high, or if violating) the number of neurons equals the number of classes, activation function is Softmax, loss function is cross entropy loss.
Optimizer Adam, RMSprop, SGD-with-momentum. The graph neural network is suitable for risk assessment by using knowledge graph information.
Context aware models (knowledge maps) of whole or sub-graph forms. The node features comprise entity attribute vectors and multi-mode feature vectors extracted from Sp1, and the edge features can comprise relationship types, weights and the like.
The GNN layer adopts GCN, GAT and Graph-SAGE. The GCN updates the node representation by aggregating neighbor node information. GAT introduces a mechanism of attention, assigning different learning weights to different neighbor nodes. The Graph-SAGE designs various aggregation functions and supports induction learning of unknown nodes.
Nodes of GNNs gather information from their neighbors and update their own representations through multiple iterations. And the pooling layer is used for pooling the node representation at the level of the graph to obtain a representation vector of the whole graph or the target asset subgraph. The output layer is connected with the full-connection layer to carry out final risk scoring or classification. The GNN can automatically learn complex interactions and dependency relations among entities and capture the propagation modes of risks in a network, so that systematic risks and associated risks can be evaluated more accurately.
Dynamically adjusting the weight and interaction relation of each risk factor:
Online learning when there is new data (including asset performance updates, market changes) or user feedback, the model may be incrementally updated instead of being fully retrained. For neural networks, small batches of gradient descent can be used to continually update model parameters. For bayesian networks, the CPT may be updated using bayesian update methods. Either an adaptive learning rate algorithm (including Adam's variants) or a learning rate decay strategy is employed.
Feedback-based adjustment:
Direct feedback-the user (including risk analysts) can directly adjust the weights or assessment results of certain risk factors. These adjustments may be quantified and used to modify model parameters (including awarding predictions consistent with user feedback, penalties inconsistent by modifying the loss function).
Indirect feedback-user usage behavior of the report (including which parts are emphasized and which suggestions are taken) may also be used as an indirect feedback signal.
Outputting a risk image and a confidence interval:
The risk portrait comprises the steps of displaying the comprehensive risk score and the sub-item scores on different risk dimensions (including credit, market, law and operation) in a visual mode (including radar chart, instrument panel and thermodynamic diagram) to form visual depiction of the risk condition of the bad asset.
And calculating confidence intervals, namely quantifying the uncertainty of the evaluation result.
The Monte Carlo Drop out also keeps the Drop out layer activated in the prediction stage of the neural network, and forward propagation is carried out for a plurality of times to obtain a group of prediction results, and the mean and variance (or quantiles) are calculated as confidence intervals according to the distribution of the group of results.
The narrower the confidence interval, the more reliable the evaluation result. A wide confidence interval suggests that there is greater uncertainty, requiring more careful decision making or further investigation.
The risk assessment network adopts a reinforcement learning mechanism, so that the risk assessment network can learn an optimal adjustment strategy through interaction with the environment (including user feedback), thereby continuously optimizing the risk sensitivity and generalization capability of the risk assessment network. Depth deterministic strategy gradient agent construction and application;
reinforcement learning and target network parameter soft update formula:
;
the method is used for updating parameters of the target Actor network and the target Critic network in the reinforcement learning algorithm. The software update is adopted to stabilize the learning process and avoid unstable training caused by too fast change of the target network parameters.
Parameters of a target network (target Actor network or target Critic network); parameters corresponding to an online network (online Actor network or online Critic network). Soft update coefficient is a very small positive value [ ]The value range is 0.001-0.01), and the speed of 'transferring' the online network parameter to the target network parameter is controlled.The smaller the target network is, the slower the updating of the target network is, and the more stable the learning process is.
State space characterization is that the risk factor vector (from a combination of GA mining, or a more comprehensive feature set) of the current asset, the current assessment result (including risk score, dimension score) of the risk assessment network, and even some parameters or confidence indexes of the model itself. Careful design is required to include enough information to guide the decision.
And (3) data processing, namely numerical value and normalization processing.
The action space corresponds to an adjustment policy for a particular risk factor weight or activation function in the risk assessment network that the actions are continuous. Each dimension of the motion vector may correspond to an amount of adjustment (percentage increase/decrease) of the weight of a certain key risk factor in the assessment model. Adjustment of the slope or threshold of a hidden layer activation function. And adjusting weights of different sub-models in the model integration.
Constraint that the action space needs to be set with a reasonable boundary, so that unstable models caused by overlarge adjustment are avoided.
Reward signal design, quantifying user confirmation, correction or rejection of risk assessment as a scalar reward signal:
evaluation confirmation the user approves the evaluation result and gives positive rewards (including +1).
Slight correction-the user makes small adjustments to the result, giving a small positive or zero prize (including +0.1, 0).
Significant correction/rejection-the user either adjusts the result largely or totally negates, giving negative rewards (including-1, -0.5).
The correction direction consistency includes that the direction of correction by the user is consistent with the direction that the RL agent tries to adjust, and even if correction is performed, certain forward excitation can be given.
Delay rewards-sometimes the actual effect of user feedback will not appear until a period of time (including actual recovery after bad asset disposal versus predicted), the distribution of the delay rewards needs to be considered.
Sparse rewards questions-user feedback is not present for every evaluation, and sparse rewards questions need to be handled, including using rewards modeling (REWARD SHAPING) or hierarchical reinforcement learning.
Reinforcement learning Critic network loss function formula:
;
a loss function of the Critic network in the reinforcement learning algorithm is defined. The purpose of the Critic network is to learn a state action value function I.e. evaluate in stateExecute action downwardsThe degree of quality of (3). Training Critic networks to make it predictiveValue: As close as possible to the target Value of. The loss function takes the form of a mean square error.
Loss value of Critic network. The goal of training is to minimize this loss value.The number of empirical samples in one mini batch.All samples in minibatch are summed.First of allTarget of individual samplesValues, also referred to as TD targets. The calculation mode is shown in the following formula.Critic network pair NoStatus of individual samplesAnd actionsPredictedValues.Parameters of Critic network.
Reinforced learning Critic network goalThe value calculation formula:
Targets for computing Critic network learning in DDPG algorithm Value of. It fuses instant rewards and estimates of future state value based on the bellman equation.
First of allTarget Q values for the individual samples; First of all In each sample, in stateExecuting an actionThe instant rewards obtained later; discount factor, the value range is that Between, it measures the importance of future rewards relative to current rewards,The closer to 1, the heavier the agent sees the long-term return; Target Critic network For the next stateAnd by a target Actor networkActions selected in this stateEvaluated byValues. This represents an estimate of the maximum expected return that may be obtained from the next state.First of allThe next state in the sample.Target Actor networkIn stateUncertainty action of the lower output.Parameters of the target Actor network.Parameters of the target Critic network.
Reinforcement learning Actor network strategy gradient formula:
;
Sampling strategy gradients used for Actor network parameter updates in reinforcement learning algorithms are described. The purpose of the Actor network is to learn an optimal strategy So that in any stateDown selection actionThe expected cumulative return can be maximized. This gradient directs the Actor network parametersUpdating toward an action direction that can produce a higher Q value.
Objective function of Actor network(Default to expected cumulative returns) regarding its parametersIs a gradient of (a). This is the direction and magnitude of the Actor network parameter update.The number of empirical samples in one mini batch.Summing all samples in the mini batch.Critic networkOutput of (a) relates to actionAnd in the current stateAnd an action selected by the Actor network in that stateWhere an evaluation is made, this gradient term indicates the action if an Actor selectsA slight change is made to the process,The trend of the value indicates the direction of motion improvement.Actor networkWith respect to its parametersAnd in the current stateWhere the evaluation is performed. This gradient term indicates how the parameters of the Actor network affect the action of its output; Actor network in state Based on the current parametersAnd outputting the action.The target function can be obtained by multiplying the two gradient terms by the chain ruleParameters of the Actor networkThereby updating the parameters of the Actor network by a gradient ascent method to optimize the strategy.
In order for the agent to explore different adjustment strategies, noise (including Ornstein-Uhlenbeck process noise or gaussian noise) may be added to the action of the Actor network output.
And the application logic is that the RL agent continuously observes the state of the risk assessment network, takes adjustment action, receives user feedback as rewards, and continuously optimizes the adjustment strategy, so that the risk assessment network can adaptively improve the sensitivity and generalization capability of the risk assessment network to risks, and is more close to the judgment standard and actual service requirements of human experts.
Sp3, insight generation and report output based upon an interpretive model.
This step aims to translate complex model assessment results into human-understandable, trusted insights and to generate high-quality due diligence reports according to canonical logic and personalized requirements.
Sp3.1, adopting an interpretive intelligent (XAI) model to carry out attribution analysis on the output result of the risk assessment network, namely identifying key influence paths and core data evidence which lead to specific risk assessment conclusion (including high risk and advice not to pass), and enhancing model transparency and user trust.
XAI model typing and application by generating perturbation samples in a local neighborhood of the sample to be interpreted, fitting a decision tree to the risk assessment network's local behavior.
The disturbance sample is generated by randomly disturbing the characteristic value of the table data, randomly masking or replacing words of the text data, and disturbing the node characteristic or edge of the graph data.
And training a local interpretable model, namely training a weighted linear model by using the disturbance sample and a complex model prediction result corresponding to the disturbance sample, wherein the weight is based on the distance between the disturbance sample and the original sample.
The interpretation output, the coefficients of the linear model can be regarded as the contribution of each feature to the local prediction.
Application logic, LIME, can interpret the single prediction and tell the user "why this particular asset is rated as a high risk.
And (3) sorting the feature importance, namely sorting risk factors of all input risk assessment networks according to the LIME coefficients, and identifying Top-K factors with the greatest influence on the current assessment conclusion.
The impact path tracing (combining with the knowledge graph) includes that the fruit risk assessment network can locate the key risk factors identified by XAI to nodes or edges in the knowledge graph based on the GNN, or the input features of the fruit risk assessment network are associated with the knowledge graph. Then, a graph algorithm (including attention weight-based path search and shortest critical path algorithm) is utilized to trace back the critical factors in the knowledge graph, wherein the critical factors are conductive paths which interact through a series of association relations (including a guarantee chain, a fund exchange and a stock right control) and finally lead to risk event occurrence. The key risk factors and impact paths are linked back to their original data sources. Including what "contract term risk" factor is identified as important, the system should be able to locate which term, and in particular which contract, and highlight it to the user. Including if a certain financial indicator anomaly is critical, should be able to link to the corresponding item in the financial statement and its context.
The user is provided with a complete explanation chain from "why risk" to "why this risk" to "where evidence is.
Sp3.2, embedding key information, risk portraits, risk factor combinations, key impact paths and core data evidence in a context awareness model into corresponding report chapters according to a preset report logic framework and a narrative template which can be dynamically adjusted by a user, and outputting a financial bad asset reconciliation report containing deep analysis, risk early warning, treatment advice and compliance review points by using Natural Language Generation (NLG):
Report logic framework design, using a language specific to the report including XML Schema, JSON Schema or domain to define the hierarchy of the report (including covers, catalogs, abstracts, text chapters, appendices), the title of each chapter, the content modules that should be included (including asset overview, debtor analysis, vouching analysis, risk assessment summary, disposition advice, etc.), the data type and source of each module.
The user dynamically adjusts by providing a graphical configuration interface or parameterized interface that allows the user to select or customize chapters, modules, levels of detail, and presentation styles of the report based on different reporting objectives (including internal approval, external transfer, litigation support), audience types (including high-level, business, legal) or asset characteristics.
Building a narrative template library:
Template types, namely designing diversified narrative templates aiming at different analysis scenes (including short-term default caused by liquidity crisis, systematic risk caused by excessive guaranty and great shrinkage of mortgage value), risk grades, asset types and treatment strategies.
Template content the template contains fixed text and dynamic placeholders. The placeholders will be filled in by the NLG module according to the analysis results. One risk description template may be that "debtor [ debtor name ] is subject to [ critical risk factor 1] and [ critical risk factor 2], the current assessment risk level is [ risk level ], which is mainly represented by [ specific data evidence description ],".
Template management, namely supporting the creation, editing, version control and calling of the template according to requirements.
The output of the upstream module (including the list of key risk factors, knowledge graph subgraphs, risk scores) is encoded into a format acceptable to the model, including structured input (key value pairs), a sequence of linearized triples, or a sequence of text incorporating control codes (control codes). A target report paragraph or summary is generated.
A large number of "input data-output text" parallel corpora are required. May be extracted from existing written reports or may be constructed in a semi-automated manner. The section of 'risk description' is extracted from a large number of exhaustion reports, and the corresponding structured risk factors and financial data are used as input. The pre-training model is fine-tuned on the parallel corpus in the specific field to adapt to the language style, the professional term and the logic structure of the best-fit report. The loss function is typically a cross entropy loss (word-by-word prediction). And when the text is generated, a decoding strategy of bundle searching is adopted to balance the fluency, diversity and accuracy of the generated text. The method can generate more natural, flexible and contextual text, and is suitable for writing analytical comments, risk summaries, treatment suggestions and other parts requiring complex logic and detail expression.
The NLG module fills the various analysis results (deep analysis, risk early warning, treatment advice, compliance censoring points) into the selected report logic framework and narrative template, ultimately generating a complete, full-scale report manuscript (including Word, PDF, HTML format).
Generating a visual map of the risk conduction path:
Attention weighted graph neural network reasoning algorithm application:
GNNs can be trained to predict conduction probabilities or impact intensities using historical data including GAT including risk conduction itself or expert-labeled conduction relationships. On the constructed context-aware knowledge graph, reasoning is carried out by utilizing the pre-trained GNN, and potential influence paths from a certain risk source node to other nodes are identified. Attention weights can be interpreted as agents that affect intensity or conduction probability.
The high-weight (high impact/high probability) conductive paths from a particular risk event node are searched on the attention weighted graph in conjunction with Dijkstra's algorithm.
Input one or more initial risk nodes (including "core debtor a is in fluid crisis"), GNN infers and outputs a subgraph containing risk conduction paths connected by high attention weighted edges, and quantification of nodes and edges on the paths affects intensity/conduction probability.
Visual atlas and interactive node drilling:
The front end implementation renders risk conduction path maps on a Web interface using a graph visualization library. Visual elements of node size, color, edge thickness, arrow direction, etc. may be used to represent risk level, impact strength, conduction direction, etc.
The user clicks on any node in the profile (including debtor, collateral, risk event), the system should be able to dynamically request and display detailed information, related attributes, associated pieces of raw data (including treaty clause text, financial statement screenshots, news links) or text evidence for that node. This requires a close fit of the front-end and back-end APIs.
Preset portraits of the target audience, points of interest (including whether legal or market risk is more focused), professional levels (including primary analysts, senior specialists, high-level authorities), reporting purposes (including internal decisions, external disclosure, regulatory reporting). These discrete or continuous portrait dimensions are encoded into a numeric vector. Each dimension is mapped to an embedded vector using the embedding layer, and these vectors are then stitched or fused through a small network as additional conditional inputs to the transducer model.
Dynamic tuning mechanisms-conditional inputs guide the model to select or generate different high-level narrative structures. Including the fact that the audience is highly tubular, models tend to generate more generalized, conclusion-oriented text structures.
Adjustment of the term professional level:
Vocabulary control-dynamically adjusting the selection scope of the vocabulary (including restricting the use of too specialized terms, or preferentially selecting popular and understandable synonyms) at decoding time based on professional level portraits.
Style migration-the model can be trained to convert between text styles at different levels of expertise, or style control code can be added at the time of generation.
Model training-training data comprising (conditional portraits, structured semantic representations, target-customized report text) triples needs to be built. This requires extensive writing and labeling, or with weak supervised learning, multitasking learning, etc.
The continuous learning and model iteration module ensures that the system can adapt to the continuously changing data distribution, risk modes and user requirements, and maintains and improves the long-term performance of the system.
The data and concept drift detection unit is used for inputting the statistical characteristics of the data stream and monitoring whether the distribution (including mean, variance, skewness, kurtosis and category frequency) of each risk factor (numerical value type and category type) is changed significantly. For high-dimensional data (including text/image embedding), its distribution variation in the low-dimensional projection space can be monitored.
Model predictive performance monitoring whether key performance indicators (including accuracy, recall, F1 score, AUC, stability of risk score) decline over time.
The user feedback mode is changed, namely the correction amplitude, correction frequency, proposed new risk points and the like of the evaluation result are corrected by the user.
Model relation drift-the real relation between the feature and the target variable changes. Typically indirectly by monitoring the continuous decline in model performance. Or by comparing the differences between models trained on the old and new data.
Triggering a model updating flow, namely when significant drift (including a statistical test value smaller than a threshold value or performance degradation exceeding a preset amplitude) is detected, automatically making the following actions by the system:
1. and (5) sending out an alarm to inform operation and maintenance personnel and a model maintainer.
2. New data is collected, namely new data after drift occurs is collected for model retraining or adjustment.
3. Scheduling retraining tasks-automatically or semi-automatically initiating a model retraining, trimming or structural adjustment process.
And the differential knowledge map updating unit is used for monitoring a data source, continuously monitoring the change of an original data source (comprising a legal document library, a financial database and a public opinion API), and capturing newly added, modified or deleted data.
And (3) detecting and extracting the change, namely re-running the information extraction flow in the Sp1 for the changed data to obtain new entities, relations, attributes or changes thereof. These changes are incorporated into the existing knowledge-graph in an incremental fashion. Efficient addition/deletion/modification of nodes and edges and their attributes is supported. Conflict resolution policies are defined, including time stamp based (most recently valid), source based trustworthiness, or arbitration. The atomicity, consistency, isolation and durability (ACID properties) of the update process are ensured, especially in the context of concurrent updates.
And when the model is iteratively upgraded, the transfer learning effectively utilizes the knowledge learned from the historical data, accelerates the convergence of a new model, reduces the dependence on new labeling data, and improves the performance of the model on new tasks or new data distribution.
Parameter migration in new model training, parameters of old models trained on historical data are used as initial weights (or weights of part of the layers). The method is suitable for the situation that new and old tasks are similar or new data are less, on the basis of FinBERT models, including the situation that risk factors are required to be identified aiming at a novel bad asset, the weight of the universal FinBERT can be used as a starting point, fine adjustment can be performed on the new data, and the method is not started from random initialization.
Feature representation migration feature representations learned from old models (including text embedding, image embedding, graph embedding) are used directly as input features for new models, or as part of new model features, vector representations of bad assets learned from old risk assessment models can be used as node features by new, more complex assessment models (including GNNs).
Domain adaptation when there is a difference in data distribution of source domain (historical data) and target domain (new data) but the task is the same, the model is enabled to adapt to the target domain by instance-based weight adjustment, the historical data is mainly from real estate industry bad assets, the new data is mainly from manufacturing industry, and the risk assessment model can be better generalized to the manufacturing industry through domain adaptation.
The whole operation flow is shown in figure 2:
the method comprises the steps of task receiving and initializing, data acquisition and preprocessing, context perception modeling, risk factor mining, risk assessment, insight generation, report automatic generation, user interaction and report examination, report manuscript setting and output, continuous learning and model iteration, final ending, task completion and resource release.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a reference structure" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1.一种对金融不良资产进行尽调报告智能生成的方法,其特征在于,所述方法包括:1. A method for intelligently generating a due diligence report on non-performing financial assets, characterized in that the method comprises: Sp1、多模态数据深度融合与上下文感知建模:Sp1. Deep fusion of multimodal data and context-aware modeling: 采集与目标金融不良资产相关的结构化、半结构化及非结构化数据,得到异构数据;Collect structured, semi-structured and unstructured data related to target financial non-performing assets to obtain heterogeneous data; 将所述异构数据映射至统一的语义空间,识别并建立金融不良资产相关的实体、属性及其间的复杂关联关系,形成对所述金融不良资产的上下文感知模型;Mapping the heterogeneous data into a unified semantic space, identifying and establishing entities, attributes, and complex relationships related to non-performing financial assets, and forming a context-aware model of the non-performing financial assets; Sp2、基于演化计算的风险因子挖掘与自适应评估:Sp2. Risk factor mining and adaptive evaluation based on evolutionary computing: Sp2.1、初始化风险因子候选集:基于所述上下文感知模型,利用遗传算法演化计算方法,在预设的风险维度空间内进行迭代寻优,挖掘与所述金融不良资产特性相耦合的、潜在的及非显性的风险因子组合;Sp2.1. Initializing a candidate set of risk factors: Based on the context-aware model, using a genetic algorithm evolutionary computation method, iteratively search for optimality within a preset risk dimension space to discover potential and non-explicit risk factor combinations that are coupled with the characteristics of the financial non-performing assets; Sp2.2、构建风险评估网络:所述风险评估网络根据新输入的数据以及历史评估反馈,动态调整各风险因子的权重和相互作用关系,并结合集成学习策略,对金融不良资产的综合风险水平进行量化评估,输出风险画像及置信度区间;Sp2.2. Constructing a risk assessment network: The risk assessment network dynamically adjusts the weights and interactions of various risk factors based on newly input data and historical assessment feedback, and combines it with an integrated learning strategy to quantitatively assess the comprehensive risk level of non-performing financial assets, outputting a risk profile and confidence intervals. Sp3、基于可解释性模型的洞察生成与报告输出:Sp3, Insight generation and report output based on interpretable models: Sp3.1、采用可解释性智能模型,对所述风险评估网络的输出结果进行归因分析,识别导致特定风险评估结论的关键影响路径和核心数据证据;Sp3.1. Use an interpretable intelligent model to conduct attribution analysis on the output results of the risk assessment network to identify the key impact paths and core data evidence that lead to specific risk assessment conclusions; Sp3.2、依据预设的、且可由用户动态调整的报告逻辑框架和叙事模板,将所述上下文感知模型中的关键信息,利用自然语言生成,输出金融不良资产尽调报告。Sp3.2. Based on the preset report logic framework and narrative template that can be dynamically adjusted by the user, the key information in the context-aware model is generated using natural language to output a due diligence report on non-performing financial assets. 2.根据权利要求1所述的一种对金融不良资产进行尽调报告智能生成的方法,其特征在于,所述多模态数据深度融合与上下文感知建模中,进一步包括:利用注意力机制增强的预训练语言模型对法律文本中的长距离依赖关系和复杂从句结构进行解析,将提取的法律文本关键要素及其置信度评分一并编码为知识图谱中的节点与带权重的边。2. The method for intelligently generating due diligence reports on non-performing financial assets according to claim 1 is characterized in that the deep fusion of multimodal data and context-aware modeling further includes: utilizing a pre-trained language model enhanced by an attention mechanism to parse long-distance dependencies and complex clause structures in legal texts, and encoding the extracted key elements of the legal texts and their confidence scores as nodes and weighted edges in a knowledge graph. 3.根据权利要求1所述的一种对金融不良资产进行尽调报告智能生成的方法,其特征在于,所述基于演化计算的风险因子挖掘中,进一步包括:3. The method for intelligently generating a due diligence report on non-performing financial assets according to claim 1, wherein the risk factor mining based on evolutionary calculation further comprises: 通过计算由风险因子生成的决策规则的最小描述长度来量化其解释能力;The explanatory power of the decision rules generated by the risk factors was quantified by calculating the minimum description length; 通过在回测数据集上评估所述风险因子组合对历史违约事件的F1分数来衡量其预测准确性;The prediction accuracy of the risk factor combination is measured by evaluating its F1 score for historical default events on the backtest dataset; 通过计算所述风险因子组合与预设的、由领域专家校验过的典型不良资产风险模式库中的模式之间的向量空间余弦相似度来评估其关联度。The correlation is evaluated by calculating the vector space cosine similarity between the risk factor combination and the patterns in a preset typical non-performing asset risk pattern library verified by domain experts. 4.根据权利要求1所述的一种对金融不良资产进行尽调报告智能生成的方法,其特征在于,所述风险评估网络采用强化学习机制,所述强化学习机制进一步包括:构建一深度确定性策略梯度代理,代理的状态空间表征当前资产的风险因子及评估结果,动作空间对应于对所述风险评估网络中特定风险因子权重或激活函数的调整策略;将用户对风险评估的确认、修正或驳回行为量化为标量奖励信号,用于指导所述代理的策略学习。4. A method for intelligently generating due diligence reports on non-performing financial assets according to claim 1, characterized in that the risk assessment network adopts a reinforcement learning mechanism, and the reinforcement learning mechanism further includes: constructing a deep deterministic policy gradient agent, the agent's state space represents the risk factors and assessment results of the current assets, and the action space corresponds to the adjustment strategy for the weights or activation functions of specific risk factors in the risk assessment network; quantifying the user's confirmation, correction or rejection of the risk assessment into a scalar reward signal to guide the agent's policy learning. 5.根据权利要求1所述的一种对金融不良资产进行尽调报告智能生成的方法,其特征在于,所述可解释性模型的洞察生成还包括:生成风险传导路径的可视化图谱,所述生成风险传导路径的可视化图谱进一步包括:在所述上下文感知模型的知识图谱上,采用基于注意力加权的图神经网络推理算法,识别并量化不同风险实体及风险因子之间的影响强度和传导概率,形成传导路径。5. The method for intelligently generating due diligence reports on non-performing financial assets according to claim 1 is characterized in that the insight generation of the explainable model also includes: generating a visual map of the risk transmission path, and the generating a visual map of the risk transmission path further includes: using an attention-weighted graph neural network inference algorithm on the knowledge graph of the context-aware model to identify and quantify the influence intensity and transmission probability between different risk entities and risk factors to form a transmission path. 6.根据权利要求1所述的一种对金融不良资产进行尽调报告智能生成的方法,其特征在于,所述报告输出中,所述自然语言生成进一步包括:利用条件文本生成模型,以目标受众的预设画像作为条件输入,结合可解释性模型洞察中提取的结构化语义表示,动态选择叙述模板、调整术语的专业级别、并控制论证的详细层级,以生成符合特定受众需求的定制化报告文本。6. A method for intelligently generating due diligence reports on non-performing financial assets according to claim 1, characterized in that, in the report output, the natural language generation further includes: utilizing a conditional text generation model, taking a preset profile of the target audience as a conditional input, combining structured semantic representations extracted from interpretable model insights, dynamically selecting narrative templates, adjusting the professional level of terminology, and controlling the detailed level of argumentation to generate customized report text that meets the needs of specific audiences. 7.根据权利要求1所述的一种对金融不良资产进行尽调报告智能生成的方法,其特征在于,所述方法还包括持续学习与模型迭代模块,所述持续学习与模型迭代模块进一步包括:7. The method for intelligently generating a due diligence report on non-performing financial assets according to claim 1, further comprising a continuous learning and model iteration module, wherein the continuous learning and model iteration module further comprises: 数据与概念漂移检测单元,用于实时监控输入数据流的统计特性及用户反馈模式的变化,当检测到显著漂移时自动触发模型更新流程;Data and concept drift detection unit, which monitors the statistical characteristics of input data streams and changes in user feedback patterns in real time, and automatically triggers the model update process when significant drift is detected; 差分知识图谱更新单元,将新增或变更的实体、关系及其置信度以增量方式合并至现有知识图谱,并利用迁移学习将从历史数据中学习到的知识应用于新模型的迭代升级。The differential knowledge graph update unit incrementally merges new or changed entities, relationships, and their confidence levels into the existing knowledge graph, and uses transfer learning to apply the knowledge learned from historical data to the iterative upgrade of the new model. 8.一种对金融不良资产进行尽调报告智能生成的系统,其包括一处理器及与所述处理器耦合的存储器,所述存储器中存储有计算机程序指令,所述计算机程序指令被所述处理器执行,其特征在于,所述系统进一步包括:8. A system for intelligently generating due diligence reports on non-performing financial assets, comprising a processor and a memory coupled to the processor, wherein the memory stores computer program instructions, and the computer program instructions are executed by the processor, wherein the system further comprises: 多模态数据深度融合与上下文感知建模引擎:Multimodal data deep fusion and context-aware modeling engine: 从异构数据源采集与目标金融不良资产相关的数据;Collect data related to target financial non-performing assets from heterogeneous data sources; 构建金融不良资产的上下文感知模型;Build a context-aware model of financial non-performing assets; Sp2、基于演化计算的风险因子挖掘与自适应评估引擎:Sp2, risk factor mining and adaptive assessment engine based on evolutionary computing: 在预设的风险维度空间内进行迭代寻优;Perform iterative optimization within the preset risk dimension space; 构建并运行风险评估网络;Build and operate a risk assessment network; Sp3、基于可解释性模型的洞察生成与报告输出引擎:Sp3, Insight Generation and Report Output Engine Based on Explainable Models: 采用可解释性智能模型对风险评估结果进行归因分析;Use interpretable intelligent models to conduct attribution analysis on risk assessment results; 输出金融不良资产尽调报告。Output due diligence report on financial non-performing assets. 9.根据权利要求8所述的一种对金融不良资产进行尽调报告智能生成的系统,其特征在于,所述多模态数据深度融合与上下文感知建模引擎中的预训练语言模型处理单元进一步包括:9. The system for intelligently generating due diligence reports on non-performing financial assets according to claim 8, wherein the pre-trained language model processing unit in the multimodal data deep fusion and context-aware modeling engine further comprises: 复杂长句分割与依存关系解析模块,用于精确识别合同条款中的主从结构和限定条件;Complex long sentence segmentation and dependency parsing module, used to accurately identify the master-slave structure and restrictive conditions in contract terms; 基于领域知识增强的语义角色标注模块,针对金融交易中的特定参与方角色和核心法律行为进行标注。The semantic role labeling module, based on domain knowledge enhancement, labels the specific roles of participants and core legal behaviors in financial transactions. 10.根据权利要求8所述的一种对金融不良资产进行尽调报告智能生成的系统,其特征在于,所述基于演化计算的风险因子挖掘与自适应评估引擎进一步包括:10. The system for intelligently generating due diligence reports on non-performing financial assets according to claim 8, wherein the risk factor mining and adaptive assessment engine based on evolutionary computing further comprises: 反馈驱动的模型参数调整模块,反馈驱动的模型参数调整模块进一步包括:The feedback-driven model parameter adjustment module further includes: 用户反馈实时捕获与结构化处理接口,用于接收并解析用户对风险因子和评估结果的标注信息;User feedback real-time capture and structured processing interface, used to receive and analyze user annotation information on risk factors and assessment results; 增量式模型训练与版本控制单元,所述增量式模型训练与版本控制单元内嵌有演化算法,且增量式模型训练与版本控制单元支持对所述演化算法的种群初始化策略和所述评估网络的局部连接权重进行在线微调;An incremental model training and version control unit, wherein the incremental model training and version control unit has an embedded evolutionary algorithm and supports online fine-tuning of the population initialization strategy of the evolutionary algorithm and the local connection weights of the evaluation network; 离线批量再训练调度器,用于在累积足够新数据或反馈后触发对整个模型体系的全局优化。Offline batch retraining scheduler, used to trigger global optimization of the entire model system after accumulating enough new data or feedback.
CN202510682535.XA 2025-05-26 2025-05-26 Intelligent generation method for adjustment report of bad financial assets Active CN120198232B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510682535.XA CN120198232B (en) 2025-05-26 2025-05-26 Intelligent generation method for adjustment report of bad financial assets

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510682535.XA CN120198232B (en) 2025-05-26 2025-05-26 Intelligent generation method for adjustment report of bad financial assets

Publications (2)

Publication Number Publication Date
CN120198232A CN120198232A (en) 2025-06-24
CN120198232B true CN120198232B (en) 2025-09-12

Family

ID=96073628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510682535.XA Active CN120198232B (en) 2025-05-26 2025-05-26 Intelligent generation method for adjustment report of bad financial assets

Country Status (1)

Country Link
CN (1) CN120198232B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120448161B (en) * 2025-07-08 2025-09-12 浪潮通用软件有限公司 Data resource migration risk prediction method, system, terminal equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117217807A (en) * 2023-11-08 2023-12-12 四川智筹科技有限公司 Bad asset valuation algorithm based on multi-mode high-dimensional characteristics
CN118230983A (en) * 2024-03-15 2024-06-21 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) Risky drug identification method and system based on drug feedback mining

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118627619A (en) * 2024-06-13 2024-09-10 广东司法警官职业学院 A method for extracting legal document information
CN118761735B (en) * 2024-07-09 2025-04-25 国网湖北省电力有限公司信息通信公司 An electronic contract management method and system based on deep learning model
CN119719388B (en) * 2025-02-26 2025-05-27 北京科杰科技有限公司 Method and system for constructing data knowledge graph of complex ecological intelligent brain drive

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117217807A (en) * 2023-11-08 2023-12-12 四川智筹科技有限公司 Bad asset valuation algorithm based on multi-mode high-dimensional characteristics
CN118230983A (en) * 2024-03-15 2024-06-21 中山大学肿瘤防治中心(中山大学附属肿瘤医院、中山大学肿瘤研究所) Risky drug identification method and system based on drug feedback mining

Also Published As

Publication number Publication date
CN120198232A (en) 2025-06-24

Similar Documents

Publication Publication Date Title
CN117453921B (en) Data information label processing method of large language model
US11250513B2 (en) Computer implemented system for generating assurance related planning process and documents for an entity and method thereof
CN120198232B (en) Intelligent generation method for adjustment report of bad financial assets
US12164503B1 (en) Database management systems and methods for datasets
US12373441B1 (en) Using a multi-model architecture for retrieval-augmented generation (RAG)
CN119939199B (en) SMS AI training model intelligent matching method and system
CN119398039A (en) Negative public opinion information extraction method, device, equipment and medium
CN119067238B (en) Method and system for generating scientific creation big data model based on big model
CN119025628B (en) Intelligent government affairs processing method and system based on big model and process robot
EP4485297A1 (en) Generation and use of classification model from synthetically generated data
US20250217334A1 (en) Database and data structure management systems
Saratha et al. A novel approach for improving the accuracy using word embedding on deep neural networks for software requirements classification
Tang et al. Explainable person–job recommendations: Challenges, approaches, and comparative analysis
Yu et al. Research on the design of a data mining-based financial audit model for financial multi-type data processing and audit trail discovery
US20250190845A1 (en) Database and data structure management systems and methods
CN120355308B (en) Data assessment method, device and system based on large model and storage medium
CN120317383B (en) Legal logic map construction and legal event reasoning method
US12430319B2 (en) Proactive database management systems
US20250181592A1 (en) Database management systems
US20250181602A1 (en) Database and data structure management systems facilitating dataset consolidation
US20250181595A1 (en) Database management systems
Yerashenia et al. Generic architecture for predictive computational modelling with application to financial data analysis: integration of semantic approach and machine learning
Trust Efficient adaptation of Large Language Models for digital media
Keyhanipour Uncertainty handling in learning to rank: a systematic review
Darwish et al. Stock Market Forecasting: From Traditional Predictive Models to Large Language Models: M. Darwish et al.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant