[go: up one dir, main page]

CN120568005B - Intelligent conference recording and recording method and system - Google Patents

Intelligent conference recording and recording method and system

Info

Publication number
CN120568005B
CN120568005B CN202511068063.5A CN202511068063A CN120568005B CN 120568005 B CN120568005 B CN 120568005B CN 202511068063 A CN202511068063 A CN 202511068063A CN 120568005 B CN120568005 B CN 120568005B
Authority
CN
China
Prior art keywords
information
conference
target
meeting
generate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202511068063.5A
Other languages
Chinese (zh)
Other versions
CN120568005A (en
Inventor
胡宏清
吕金文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Rgblink Science & Technology Co ltd
Original Assignee
Xiamen Rgblink Science & Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Rgblink Science & Technology Co ltd filed Critical Xiamen Rgblink Science & Technology Co ltd
Priority to CN202511068063.5A priority Critical patent/CN120568005B/en
Publication of CN120568005A publication Critical patent/CN120568005A/en
Application granted granted Critical
Publication of CN120568005B publication Critical patent/CN120568005B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses an intelligent conference recording and recording method and system, which are characterized in that cloud service equipment receives multi-mode information sent by edge equipment, processes the multi-mode information, combines context-aware error correction, content segmentation, dynamic segmentation strategy and preset storage strategy to generate conference recording information, encrypts the conference recording information based on participant role information to generate target format conference information, processes the target format conference information to generate initial conference scene mode information containing conference outline, PPT and mind map, processes the target format conference information and the initial conference scene mode information to generate conference information data subjected to priority ordering and strategy processing, performs semantic analysis, feature fusion and cross-mode association processing on the conference information data to generate target conference scene mode information, confirms a target application program based on the target conference scene mode information, and sends target format instruction information to the target application program.

Description

Intelligent conference recording and recording method and system
Technical Field
The invention relates to the technical field of multi-mode data processing, in particular to an intelligent conference recording and recording method and system.
Background
In modern office environments, the need for intelligent meeting recording and recording systems is increasing. The traditional conference recording and recording modes have a plurality of limitations, and the requirements of high-efficiency office work are difficult to meet. From the aspect of meeting recording, manual recording is time-consuming and labor-consuming, and key information such as important decision details, responsibility division, time nodes and the like is very easy to miss due to limited energy of recording personnel. In some meetings with intense and rapid discussions, the recording personnel have more difficulty keeping up with the speaking speed, which results in incomplete and inaccurate recorded content. Even with the aid of a recording device, the subsequent extraction of critical information from lengthy recordings and the arrangement into documents requires a significant amount of time and effort. Although some voice-to-text technology is currently applied to conference recording, the problem of inaccurate recognition of technical terms is common. Meetings of different industries often contain a large number of specialized vocabularies, and common transcription software often has false recognition due to lack of deep optimization on a term base of a specific industry, so that the accuracy and usability of recording are seriously affected. Moreover, most of the texts generated by the existing transcription tools lack structural processing, are simply packed, and are not classified and arranged according to conference subjects, discussion matters, resolution results and the like, so that the follow-up examination and use of conference records are inconvenient.
Conventional recording devices and methods also face challenges in the field of conference recording. In a conference site, a plurality of devices are required to be manually operated to ensure that conference pictures and sounds are comprehensively captured, so that the operation is complex, and certain important scenes or sounds are easily not recorded due to human negligence. For example, in a large conference room, if the angle of a camera is improperly adjusted, all participants and display contents may not be completely shot, and the microphone is not reasonably arranged, so that the situations of unclear sound collection, weak sound in a partial area and the like can occur. In addition, for a remote conference, the existing recording mode is difficult to realize high-efficiency integration and clear recording of audios and videos of different participant terminals, and problems such as sound delay, picture blocking or asynchronous occur frequently, so that the quality and the integrity of conference recording are seriously affected. The video recorded by the traditional conference often lacks intelligent analysis and processing functions, and key contents, important person speaking and a core decision part of the conference cannot be automatically identified, so that useful information can be searched from a large number of recorded videos later like a sea fishing needle, and the utilization efficiency of conference recording data is greatly reduced.
Disclosure of Invention
In order to solve the technical problems, the invention provides the following technical scheme:
The intelligent conference recording and recording method is applied to edge equipment and comprises the steps of obtaining audio information and video information, processing the audio information and the video information by combining a time axis to generate multi-mode information, wherein the multi-mode information comprises character information, voice information and image information, sending the multi-mode information to cloud service equipment for the cloud service equipment to generate target conference scene information, carrying out matching processing on the basis of a preset matching rule and a target hard disk, and sending the multi-mode information to the target hard disk for the target hard disk to store the character information, the voice information and the image information if matching is successful.
The intelligent conference recording and recording method is applied to cloud service equipment and comprises the steps of receiving multi-mode information sent by edge equipment, processing the multi-mode information, combining context-aware error correction, content segmentation, dynamic segmentation strategies and preset storage strategies, generating conference recording information, conducting encryption processing on the conference recording information based on participant role information, generating target format conference information, processing the target format conference information, generating initial conference scene mode information containing conference outline, PPT and thought guide diagrams, processing the target format conference information and the initial conference scene mode information based on priority processing strategies of the multi-mode information, generating conference information data subjected to priority sorting and strategy processing, conducting semantic analysis, feature fusion and cross-correlation processing on the conference information data, generating target conference scene mode information, confirming a target application program based on the target conference scene mode information, sending target format instruction information to the target application program for processing, positioning similar decision fragments in a history conference through semantic search technology, and providing data support and processing capacity for cross-search.
An intelligent conference recording and recording system comprises an edge device for acquiring audio information and video information, processing the audio information and the video information by combining a time axis to generate multi-mode information, wherein the multi-mode information comprises text information, voice information and image information, sending the multi-mode information to cloud service equipment for generating target conference scene information, receiving the multi-mode information sent by the edge device by the cloud service equipment, wherein the multi-mode information comprises the text information, the voice information and the image information, processing the multi-mode information, combining a context-aware error correction, content segmentation, a dynamic segmentation strategy and a preset storage strategy to generate conference recording information, conducting encryption processing on the conference recording information based on the participant role information to generate target format conference information, processing on the basis of the target format conference information to generate initial conference scene mode information comprising conference outline, PPT and a mind map, processing the target format information and the initial conference scene mode information based on the basis of the priority processing strategy of the multi-mode information to generate conference information data subjected to priority ordering and strategy processing, conducting semantic analysis, feature fusion and cross-association processing on the conference information data, conducting hard disk matching with the target conference scene information based on the target conference scene information, and applying the target conference scene information to a hard disk to be matched with a target program if the target conference scene information is successfully processed by a target conference scene information application program, and a target conference program is generated by matching the target conference scene information, the voice information and the image information are stored.
The intelligent conference recording and recording method has the advantages that the intelligent conference recording and recording method solves the problems of industry pain points, such as signal access limitation, recording authority management and the like, supports the access of mainstream video conference software, generates a multi-disc file through time axis driving, and is efficient in multi-platform distribution. By adopting an edge-cloud collaborative architecture, edge preprocessing guarantees instantaneity, cloud deep analysis improves intelligence, encryption strategy and equipment SN are bound to ensure data safety, and a complete technical system is provided for intelligent conference scenes.
Drawings
Fig. 1 is a flowchart of an intelligent conference recording and recording method according to an embodiment of the present invention when applied to an edge device;
Fig. 2 is a flowchart of an intelligent conference recording and recording method according to an embodiment of the present invention when applied to a cloud service device;
fig. 3 is a schematic block diagram of an intelligent conference recording and recording system according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present application will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present application only, and are not intended to limit the present application. A system and method for intelligent meeting recording and recording according to an exemplary embodiment of the present application is described below in conjunction with fig. 1. In one embodiment, the application further provides an intelligent conference recording and recording method. In the embodiment of the application, an intelligent conference recording and recording method is applied to edge equipment, as shown in fig. 1:
S101, acquiring audio information and video information.
In one embodiment, bluetooth version 5.0 and above is adopted to support the A2DP audio transmission protocol, 8 microphones can be paired simultaneously, voice definition is ensured through the SBC/AAC coding format, and the effective connection distance is up to 10 meters. The device ranging and pairing are carried out by utilizing 40kHz ultrasonic signals, the microphone transmits an ultrasonic beacon, the cloud treasure receives signals through the microphone array and calculates the phase difference, so that centimeter-level positioning and automatic pairing are realized, and the device is suitable for stable connection in a complex electromagnetic environment. After the edge device starts the pairing mode, the WiFi/Bluetooth microphone device is searched through broadcasting frame synchronization, and the ultrasonic microphone actively transmits a ranging signal. The detected device is authenticated (e.g. MAC address white list), and an encrypted transmission channel is established after authentication (WiFi uses WPA2-PSK and Bluetooth uses AES-128 for encryption).
Aiming at software such as Zoom, teams and the like, video stream data is acquired through an official SDK (such as ZoomWebSDK, teamsJavaScriptSDK), an H.264/H.265 coding format is supported, and the resolution is up to 2160p/60fps. The PPT sharing picture is acquired in real time by utilizing the screen capturing technology (such as WindowsDesktopDuplicationAPI) of the edge equipment, and the self-adaption of the dynamic refresh rate (15-60 Hz) is supported. And an HDMI/USB interface is adopted to convert signals of external video conference equipment (such as a camera and a video conference terminal) into audio and video data streams of cloud precious, so that hot plug and play are supported.
And 4-8 channel MEMS microphone arrays are adopted, the array spacing is 10-15cm, the target sound source (speaker) signals are enhanced through the beam forming technology, the environmental noise (such as keyboard sound and air conditioner sound) is restrained, and the signal-to-noise ratio is improved by 15-20dB. The sampling rate is set to 44.1kHz/16bit, which meets the CD tone quality standard and supports mono/stereo mode switching. Meanwhile, camera pictures (1920 multiplied by 1080) and PPT sharing pictures (1280 multiplied by 720) of the participants are collected, and real-time compression is carried out through a hardware coding chip (IntelQuickSync for example), so that bandwidth occupation is reduced. The voice signal adopts a mode of combining spectral subtraction and wiener filtering to remove stationary noise (such as fan noise) and non-stationary noise (such as sudden cough), and the video adopts a 3D noise reduction algorithm to reduce noise points in a low-light environment. The voice signal is converted from PCM to FLAC/OPUS format (compression ratio is 2:1), and the video is converted from YUV to MP4 format encoded by H.264, so that the subsequent storage and processing are facilitated. The audio and video clocks are synchronized through PTP (precision time protocol), the error is controlled within +/-5 ms, and the problem of asynchronous audio and video is avoided.
S102, processing the audio information and the video information by combining a time axis to generate multi-mode information, wherein the multi-mode information comprises text information, voice information and image information.
In one embodiment, the edge device removes noise from the collected voice signal by adopting spectral subtraction and wiener filtering, the noise of the video is reduced by adopting a 3D noise reduction algorithm, then the audio and video clocks are synchronized by using a PTP protocol, and the error is controlled within +/-5 ms. In a national conference, for example, edge equipment performs noise reduction processing on audio from participants in different regions, performs noise reduction on video, and synchronizes the processed audio and video based on a time axis to ensure synchronization of audio and video. After processing, multi-modal information including text, speech and images is generated. The edge device converts the audio information into text information and voice information, and the video information into image information. For example, in a meeting of a certain company, the edge equipment converts speech audio of participants into text information through an ASR technology, meanwhile, the speech information is reserved, a picture shot by a camera and a PPT sharing picture are used as image information, and finally multi-mode information containing text, speech and images is generated.
The photo and the voiceprint feature are recorded before meeting, and a related database of 'face feature vector + voiceprint feature vector + person name' is established. Face images of a speaking person are captured in real time through a camera in a conference, face features are extracted and then compared with a database, meanwhile, voice is collected through a microphone, voiceprint features are extracted, and identity is doubly verified (for example, identity is confirmed when face confidence is more than 90% and voiceprint matching is more than 85%). If the face comparison is successful (if the speaker wears the mask to cause voiceprint acquisition failure), the voiceprint matching result is defaulted to mark the speaking record, and if the voiceprint matching is successful but no facial image (if only voice speaking is performed), the identity is associated through the voiceprint. The edge equipment can integrate a lightweight face recognition module (such as real-time detection based on OpenCV) and a voiceprint recognition SDK (such as signal flying voiceprint recognition), and after feature extraction is locally completed, the encrypted feature value is sent to the cloud service equipment for database comparison. The cloud service equipment stores the participant information base, and the comparison result is associated with a voice-to-text result driven by a time axis after being returned to the edge equipment (for example, "14:05:20 three: the key point of the conference.").
The conference creator can import information such as photos, names, roles and the like of the participants (such as Excel template import) in batches through a management background when a conference is scheduled, and the system automatically generates a participant information base which is suitable for internal fixed conferences (such as weekly meetings and monthly meetings). When the OA system creates a conference, enterprise users synchronously upload a list and photos of the participants, and the system is automatically associated to the conference ID. The system can automatically record photos and voiceprints (such as a small code scanning program, a head portrait and 3 seconds of voice recording) through portals of a mobile terminal APP, a Web terminal and the like before the meeting starts, and can update an information base in real time, so that the system is suitable for a temporary meeting or an external guest meeting scene. After the input, the system generates a time-efficient two-dimensional code (integrated with the HMAC signature two-dimensional code in the document), and the identity accuracy is improved by performing double verification of the code scanning and the face/voiceprint during the participation.
In addition, a related database of face feature vectors and names can be established, and the conference information can be imported in a batch (suitable for fixed conferences) by a conference creator uploading the information of the photos, the names and the roles of the conference in batches through an Excel template by a management background (such as an OA system), extracting the face feature vectors by the system through models such as FaceNet, associating the face feature vectors with the names, storing the name with the name in a participant information base of cloud service equipment, and generating an identity information set corresponding to the conference ID. The autonomous recording (suitable for a temporary conference) is that a participant enters an applet through a mobile terminal APP or a Web terminal code scanning before the conference starts, a front photo is shot and a name is input, the system extracts the face characteristics in real time and encrypts and stores the face characteristics, and meanwhile, a timeliness two-dimensional code (embedded face characteristic hash value) containing an HMAC signature is generated for verification during the conference. The edge device captures video streams in real time through a UVC camera, detects a face region by utilizing a MTCNN model of OpenCV, extracts feature vectors, compares the feature vectors with a locally cached feature library (or cloud service device interface), matches names if cosine similarity is more than 0.85, and associates a result with a voice-to-text record driven by a time axis (for example, "14:05:20 three: the key point of the conference.").
In addition, a related database of 'face feature vector + name' can be established, and the participant information can be imported in a batch manner by synchronously uploading the names of the participants and prerecorded voice files (such as reading specified texts) when the enterprise users are in a preset conference, and the system extracts the voiceprint feature vector through the MFCC or DEEPSPEECH model, and then encrypts and stores (AES-256) the voiceprint feature vector and the names after binding to the cloud service equipment. And recording 3 seconds of voice (i.e. "I am Zhang three, participated in the conference") by the participant through the mobile terminal, extracting voiceprint features by the system, associating the voiceprint features with the name, updating the information base in real time, and generating a time-efficient two-dimensional code containing the hash value of the voiceprint features. The edge device frames the voice stream input by the microphone (50 ms/frame), extracts the voiceprint feature sequence through the MFCC, performs Dynamic Time Warping (DTW) matching with the voiceprint library of the local or cloud service device, associates the name if the probability is more than 0.7, and combines the 3D noise reduction algorithm in the document to reduce the environmental noise interference.
And S103, the multi-mode information is sent to the cloud service equipment so that the cloud service equipment can generate target meeting scene information.
In one embodiment, after the acquisition and processing of the audio and video information are completed, the edge device generates multimodal information including text, speech and images, ready for sending to the cloud service device. In a meeting of an enterprise, the edge equipment is connected with a microphone through WiFi and Bluetooth to collect voice, obtains a camera video and a PPT sharing picture, and generates multi-mode information comprising conference speaking words, voice fragments and conference picture images after noise reduction, compression, time axis synchronization and other processing.
The edge device establishes encryption connection with the cloud service device through the TLS1.3 secure communication protocol, monitors the appointed port (e.g. 443), and constructs a secure channel for data transmission. For example, in a monthly board conference of a across-the-country company, the edge device establishes TLS connection with the cloud service device of each branch, so that the transmission security of the multilingual conference record is ensured. The edge device sends the generated multi-mode information (such as 150MB conference record, including encrypted voice, video clips and initial scene mode information) to the cloud service device through the established secure channel by adopting block coding (1 MB per block). And after a certain enterprise meeting, the edge equipment sends the partitioned meeting record data to the cloud service equipment. The edge device calculates a SHA-256 hash value for each block of data transmitted, and appends the hash check code. If the cloud service device detects that the hash value of a certain data block (for example, number 0x3 A5F) is inconsistent with the local calculation result, the edge device receives the retransmission request and retransmits the data block. When abnormal conditions such as data loss or damage occur in the transmission process, the edge equipment performs retransmission and other processing according to the request of the cloud service equipment, so that the data integrity is ensured. When the cloud service equipment detects that 23-block data is missing when receiving data, the edge equipment retransmits the missing data block after receiving a retransmission request, so that the cloud service equipment is ensured to acquire complete multi-mode information.
And S104, carrying out matching processing on the target hard disk based on a preset matching rule, and if the matching is successful, sending the multi-mode information to the target hard disk so as to store the text information, the voice information and the image information by the target hard disk.
In one embodiment, when the target hard disk leaves the factory, the serial number of the edge device (such as 'YUNBAO-20250622-001') and the unique identifier of the hard disk (such as 'YUNBAO-HD-001') are subjected to AES-256 encryption binding through a special encryption tool, so that a non-tamperable pairing record is generated and stored in the hard disk firmware. The binding procedure adopts hardware-level encryption (such as TPM chip) to ensure that the physical mapping relation between the serial number and the hard disk cannot be falsified or tampered. When the edge device is connected to the target hard disk, the system automatically triggers firmware level verification, namely the edge device reads the binding sequence number stored in the hard disk, binary compares the self sequence number with the read sequence number (the error is less than or equal to 0 bit), establishes an encryption transmission channel if the matching is successful (the similarity is 100%), and refuses the data writing and triggers an alarm if the matching is failed.
Because the serial number pairing is completed when leaving the factory, when the edge equipment sends the audio and video data to the cloud service equipment, the matching process is not required to be executed again, and the audio and video data is directly stored to the target hard disk according to the following steps of establishing an encryption channel, generating a dynamic session key through an AES-256 algorithm, carrying out block encryption (1 MB per block) on multi-mode information (such as 150MB conference records), and ensuring the whole-course encryption (TLS 1.3 protocol) of a transmission link. And (3) writing data, namely directly writing the encrypted data into a specified partition (such as a meeting record_encryption area) of a hard disk through an M.2 interface, wherein the writing speed is required to be more than 100MB/s (adapting to a high-speed solid-state disk).
And (3) verifying the effectiveness of sequence number pairing on the target hard disk based on a preset matching rule of the sequence number information of the edge device and the target hard disk encryption binding, if the matching is successful, sending the multi-mode information to the target hard disk through an AES256 encryption channel, synchronously generating an encryption transmission log, and reserving interface information of rebinding the new sequence number information of the edge device and the hard disk when the device is damaged. Specifically, the edge device performs serial number pairing verification on the target hard disk based on a preset rule that the self serial number information is cryptographically bound with the target hard disk. The serial number of the edge device is 'YUNBAO-20250622-001', and the target hard disk is encrypted and bound with the serial number when leaving the factory. When the edge equipment is accessed to the target hard disk, the system automatically reads the binding sequence number stored in the hard disk, compares the binding sequence number with the self serial number and verifies the pairing effectiveness.
If the serial numbers are successfully matched, the edge device sends the multi-mode information to the target hard disk through an AES256 encryption channel. After a certain enterprise meeting, the edge equipment generates multi-mode information (such as 150MB meeting records) containing characters, voice and images, encrypts the data through an AES256 encryption algorithm, and transmits the data to a target hard disk through a special encryption channel to ensure that the data is not stolen or tampered in the transmission process. And synchronously generating an encrypted transmission log and recording key information in the transmission process. The transmission log contains information such as transmission time (2025, 06, 22 days, 14:30:00), data volume (150 MB), encryption algorithm (AES 256), target hard disk serial number (YUNBAO-HD-001), transmission state (success/failure) and the like, so that subsequent tracing and auditing are facilitated.
And reserving interface information for rebinding the serial number of the new edge device with the hard disk when the device is damaged. When the edge equipment is damaged due to failure, a technician can rebind the serial number (such as 'YUNBAO-20250623-001') of the new edge equipment with the target hard disk through the reserved interface, so that data migration can be realized without replacing the hard disk, and the equipment replacement efficiency is improved. The edge device realizes the safe data interaction with the target hard disk through the standardized flow of serial number verification, encryption transmission, log generation and interface reservation. The process not only ensures the storage safety of multi-mode information, but also improves the flexibility of equipment operation and maintenance through a reserved interface mechanism, and is suitable for local encryption storage scenes of enterprise meeting records.
As shown in fig. 2, an intelligent conference recording and recording method is applied to cloud service equipment, and includes:
S201, receiving multi-mode information sent by the edge equipment, wherein the multi-mode information comprises text information, voice information and image information.
In one embodiment, the cloud service device establishes an encrypted connection with the edge device through a secure communication protocol (such as TLS 1.3), listens to a designated port to receive multi-mode information, and uses block coding to ensure stable transmission of the large file. After a certain enterprise weekly meeting, the edge device sends 150MB meeting records (containing encrypted voice, video clips and initial scene mode information) to the cloud service device, the cloud service device receives data through the load balancing node, and the TCP serial number is checked to ensure that no data is lost (such as triggering retransmission when the 23 rd block of data is detected to be missing).
And calculating the SHA-256 hash value of the received data packet, comparing the SHA-256 hash value with a hash check code attached to the edge equipment, and verifying whether the data format accords with the protocol standard. If the received hash value of the encrypted financial data fragment is inconsistent with the local calculation result (such as 1bit overturn caused by network fluctuation), the cloud service equipment marks the fragment as damaged, and requests the edge equipment to retransmit the data block with the number of 0x3A 5F.
Analyzing the initial scene mode information, extracting conference outline, PPT, thinking guide graph structure and time axis mark, storing in distributed file system according to mode classification, and associating with index database. The cloud service equipment receives the initial scene mode information of the quaternary strategic meeting, analyzes the thinking guide graph structure (comprising a branch of market analysis), stores text contents into a specified path, stores the PPT and the thinking guide graph JSON structure into a metadata catalog, and establishes an associated index of the meeting theme-time-content type in an index library.
And extracting key metadata (such as equipment SN and a timestamp) of the encrypted conference information, checking the validity of the equipment SN, and performing hierarchical decryption on the encrypted fragments according to the role authority of the participant. The cloud service equipment receives the encrypted financial conference record sent by the edge equipment, the extraction equipment SN is YUNBAO-20250622-001 and verifies that the encrypted financial conference record is in the white list, and because an administrator has the authority of 'financial data access', the AES-256 encrypted speaking fragments of the CFO can be decrypted, and common staff can only view the conference outline encrypted by the AES-128. The cloud service equipment realizes the safe receiving and preprocessing of the multi-mode information of the edge equipment through the standardized flow of link establishment, verification analysis, classified storage and authority verification. For example, in a board meeting of a cross-country company, the cloud service equipment receives an 800MB data packet containing a multi-language meeting record, and synchronizes to the local cache of the edge equipment after SHA-256 verification, modal separation storage and authority decryption, so as to provide standardized input data for subsequent multi-modal priority processing and cross-meeting retrieval. The process ensures the integrity, security and traceability of the data.
S202, processing the multi-mode information, and generating target format conference information and initial conference scene mode information sent by the conference record information receiving edge device by combining context-aware error correction, content segmentation, dynamic segmentation strategy and preset storage strategy.
In one embodiment, the cloud service device establishes an encrypted connection with the edge device (cloud treasure) through a secure communication protocol (such as TLS 1.3), monitors the designated port (such as 443) to receive the meeting information in the target format and the initial meeting scene mode information, and uses block coding (1 MB per block) for data transmission to ensure stable transmission of the large file. After the end of a certain enterprise week, the edge device sends a conference record (150 MB, including encrypted voice/video fragments and initial scene mode information) to the cloud service device, the cloud service device receives data through the load balancing node, and the TCP serial number is checked to ensure that the data is not lost (if the 23 rd block of data is detected to be missing, a retransmission request is triggered).
Calculating SHA-256 hash value of received data packet, comparing with hash check code attached to edge equipment (when error is less than or equal to 1bit, judging to be complete), and verifying whether data format accords with conference information protocol standard (such as JSONschema check field integrity, XML format verification label closing). If the hash value of the encrypted financial data fragment in the received meeting information in the target format is inconsistent with the local calculation result (for example, 1bit is turned over due to network fluctuation in the transmission of the edge device), the cloud service device marks the fragment as damaged, and requests the edge device to retransmit the data block (number 0x3A 5F).
Analyzing conference information in a target format, separating voice (.wav), text (.txt), video (.mp 4) and encryption metadata (such as role encryption grade), analyzing initial scene mode information, extracting conference outline, PPT mind map structure and time axis mark (such as '00:15-00:30 technical scheme discussion'), storing the conference outline, PPT mind map structure and time axis mark in a distributed file system (HDFS) according to mode classification, storing voice in a voice_pool directory, storing text in a text_pool directory, and recording file meta information (such as creation time and encryption type) in an associated index database (MongoDB). The cloud service equipment receives the initial scene mode information of the quarter strategic meeting, analyzes a thinking guide graph structure (comprising branches of market analysis and target planning), stores text contents into a/conference/20250622/strategy/text path, stores PPT and the thinking guide graph JSON structure into a/metadata/mindmap.json, and establishes an associated index of meeting theme-time-content type in an index library.
Extracting key metadata (such as equipment SN and time stamp in the combined key) of the encrypted conference information, checking the validity of the equipment SN (inquiring whether the SN is on a white list or not through an enterprise equipment management system), performing hierarchical decryption on the encrypted fragments, firstly using the equipment SN to decrypt a base layer, and then decrypting corresponding contents (such as only CFO (computational fluid dynamics) decryptable financial data fragments) according to the authority of the roles of the participants (such as the authority of a cloud service equipment manager). The cloud service equipment receives the encrypted financial conference record sent by the edge equipment, the extraction equipment SN is YUNBAO-20250622-001, the cloud service equipment is verified to belong to enterprise white list equipment, and because the cloud service equipment manager has the authority of 'financial data access', the speaking fragments of the CFO can be decrypted (AES-256 encryption), and the common staff account can only view the decrypted conference outline (AES-128 encryption).
The cloud service equipment realizes the safe receiving and preprocessing of the conference information of the edge equipment through the standardized flow of link establishment, verification analysis, classified storage and authority verification. For example, in a monthly board conference of a nationwide company, the cloud service device first establishes TLS connection with edge devices of each branch, and receives a data packet (total size of about 800 MB) containing a multilingual conference record (english, chinese) and initial scene mode information. After the data is complete through SHA-256 verification, the speech fragments (speaking in the roles of CEO, board, etc.), the multi-language text records (abstracts after automatic translation) and the PPT video are analyzed, and the data are stored to the distributed storage nodes of the European/Asia area according to the mode. And for the encrypted board decision fragment (AES-256+equipment SN encryption), the cloud service equipment completes decryption after verifying the authority of an administrator through the AD domain of the enterprise, and finally synchronizes the processed conference information to the edge equipment for local caching, and simultaneously provides standardized input data for subsequent multi-mode priority processing and cross-conference retrieval.
S203, the conference record information is encrypted based on the participant role information, and the target format conference information is generated.
In one embodiment, authority mapping and rule normalization are carried out on the role information of the participant through a role analysis flow, and a role sensitivity evaluation model and a dynamic encryption arbitration algorithm are introduced to realize the structured processing of the encryption strategy. A hybrid model (HybridModel) with multiple layers of neural networks fused with a rule engine is used to combine supervised learning (e.g., historical encryption data training) with unsupervised learning (e.g., cluster analysis role sensitivity). The input layer of the model comprises character basic characteristics, such as a participant position name (e.g. CFO, technical general monitor), a department (e.g. financial department, research and development department), a job level (e.g. L7 general monitor and L4 engineer), meeting content characteristics, such as keywords extracted from meeting records (e.g. financial data, budget), PPT title and speaking duration.
The character name and keywords are converted into vector representations (dimension 128) by Word2Vec Word vector model, the semantic similarity of the character and content is calculated (e.g., cosine similarity of "CFO" and "financial data" > 0.8), and the rules engine predefines a sensitivity mapping table (e.g., "finance department+budget discussion→sensitivity level 5"). The method comprises the steps of obtaining a double-layer fully-connected neural network (the number of hidden layer neurons is 64 and 32 respectively), taking input as a feature vector, outputting a sensitivity level (1-5 level, 5 level is highest), outputting (e.g. forcing the related content of financial data to be 5 level) and a neural network prediction result by a fusion rule engine, and generating a final level by weighted summation (rule weight is 0.6 and neural network weight is 0.4). Ultimately, a sensitivity level of the character-content pair (e.g., "CFO-financial data discussion → level 5", "board-generic utterance → level 3") is generated.
The semantic similarity threshold is 0.7 (when the semantic similarity threshold exceeds the threshold, the high sensitivity mark is triggered), the neural network learning rate is 0.001, the iteration times are 5000, the loss function is cross entropy, and the rule engine priority is that the pre-defined sensitive word bank (such as a password and a key) priority is higher than the neural network prediction. And adopting an arbitration model based on a priority queue, and combining a greedy strategy and conflict resolution rules to realize dynamic scheduling of the encryption strategy. The priority is defined as sensitivity level (5-level→aes-256, 3-level→aes-128, 1-level→no encryption), role authority weight (ceo=0.9, cfo=0.8, plain participant=0.5), data timeliness (talk over 10 minutes last →weight+0.3).
When multi-role data conflicts (such as CEO and CFO speaking simultaneously), the CFO content is encrypted according to the "sensitivity level multiplied by role authority weight" sequence (CFO-5 level multiplied by 0.8=4.0 > CEO-3 level multiplied by 0.9=2.7, and the encryption of the CFO content is prioritized), the encryption fragments are not interrupted by adopting non-preemptive scheduling, and new fragments enter a priority queue (the queue length threshold value 10 is exceeded, and the low-priority fragments are discarded). Updating sensitivity models (e.g., automatically increasing sensitivity levels of CFO content as discussion topics change from "project planning" to "financial budgets") every 5 minutes based on meeting content changes, and supporting manual intervention interfaces (administrators can force a role of content to be set to 5-level encryption).
Taking a conference of a certain company board as an example, role information, namely CEO (L9, decision layer), CFO (L8, financial department), board (L7, multiple departments), conference content keywords, namely annual budget, profit allocation (from CFO speaking) and strategic planning (from CEO speaking). Word2Vec calculates semantic similarity of CFO and annual budget to be 0.85, triggers a predefined rule of financial part+budget- > 5 level of rule engine, evaluates the strategy planning content of CEO by the neural network to be 3 level (because of low sensitivity of keywords), and finally outputs CFO content- > 5 level and CEO content- > 3 level.
Calculating encryption priority, namely CFO (5×0.8=4.0) > CEO (3×0.9=2.7), scheduling speech fragments (such as 00:20-00:30) of AES-256 encryption CFO, and performing encryption strategies in sequence because no other high-priority content exists in the same period and no queue conflicts exist. And automatically adding new role-sensitivity data (such as 'market director-client list → level 4') into the training set after each meeting, and updating Word2Vec Word vectors and neural network parameters. Model accuracy (target ≡ 0.9) is assessed periodically (weekly) by F1 score, and if below threshold, full data retraining is triggered. The model parameters are stored encrypted (using device SN as encryption key) against unauthorized tampering with the sensitivity mapping rules.
And (3) abutting against meeting authority protocol standards, constructing a role-level-department three-dimensional authority matrix, generating a standardized encryption rule set through an encryption pair-mark conversion model, and establishing an encryption strategy dynamic updating mechanism. The method comprises the steps of interfacing meeting permission protocol standards, constructing a three-dimensional matrix of 'role-job level-department' (such as 'technical department-advanced manager-decision layer'), mapping the matrix into standardized rules (such as 'technical department advanced manager can access an unencrypted version of the meeting record of the department') through an encryption contrast conversion model, and establishing a dynamic updating mechanism (such as an automatic refreshing rule when the job level of personnel changes). The cloud service equipment builds a matrix for a certain project conference, and the concrete contents include a role dimension of a project manager (decision level) and a development engineer (execution level), a job dimension of a director (L7) and a manager (L6), and a department dimension of a research and development part and a product part. "the research and development department L7 general director can decrypt all conference records of the department, and the development engineer can only access the section of own speech.
And taking the conference period as a time window, fusing the roles, job levels, departments and history encryption processing records of the participants, constructing a multidimensional encryption feature matrix, and realizing full-dimension association of encryption strategies. And taking the conference period as a time window, fusing the roles, job levels, departments and historical encryption records of the participants (such as the encryption mode of a past conference of a certain gate), constructing a multidimensional feature matrix (row: role, column: sensitivity level, depth: department), and realizing the full-dimension association of encryption strategies. The encryption level of the 'product department manager' in the history meeting on the 'demand review' content is 4, when the role in the current meeting discusses the demand again, the cloud service equipment invokes the history policy from the matrix, automatically applies 4-level encryption (such as combined key+sliced storage), and marks 'key encryption required'.
And supplementing the missing encryption rule through the sensitive prediction model, extracting a key encryption strategy by combining a permission dynamic adjustment mechanism, and fusing the key encryption strategy with the three-dimensional permission matrix characteristics to generate the target format conference information. The method comprises the steps of completing a deletion rule (when a role of a 'practice student' is newly added, the encryption authority is predicted to be the lowest level) by using a sensitive prediction model, extracting a key strategy by combining an authority dynamic adjustment mechanism (when a visitor is temporarily endowed with a 'read-only encryption' authority), and fusing the key strategy with a three-dimensional matrix to generate encrypted meeting information in a target format. A temporary meeting is added into an external consultant role, and a prediction model of the cloud service equipment complets rules that the external consultant can only access the abstract part of the meeting record, and the abstract is encrypted by AES-128). In the target information finally generated by fusion, the relevant content of the consultant is marked as 'encryption abstract', and a permission description file is attached.
The cloud service equipment realizes dynamic encryption based on the identity of the participant through a closed loop flow of role analysis, matrix construction, feature association and policy completion. For example, in a quarter financial report meeting in the financial industry, a financial data discussion segment of the CFO can be automatically identified as high-sensitivity content, the highest encryption rule (such as AES-256+ equipment SN binding) in a three-dimensional matrix of role-job-department is triggered, meanwhile, the encryption mode of the same type of role in the history meeting is referred to, the security of the data in the links of storage (encryption chip segmentation) and transmission (combined key) is ensured, and finally, the target format meeting information which can be decrypted only by authorization equipment is generated.
S204, processing the conference information in the target format, driving voice to text by combining a time axis, analyzing and processing, and generating initial conference scene mode information comprising conference outline, PPT and mind map.
In one embodiment, the time sequence mapping and rule normalization are carried out on the meeting information in the target format through the time axis analysis flow, and the meeting content evaluation model and the dynamic analysis arbitration algorithm are introduced to realize the structural integration of meeting information processing. The cloud service equipment maps the conference information (encrypted conference record) in the target format onto a time axis through a time axis analysis flow, introduces a conference content assessment model (such as assessing importance according to speaking duration) and a dynamic analysis arbitration algorithm (solving time sequence conflict), and realizes the structural integration of the information. In a project meeting, the cloud service equipment sorts financial reports (00:30-00:45) of CFOs and scheme explanations (00:15-00:30) of technical director according to time axes, evaluates that the financial reports relate to budget sensitive content, and integrates the financial reports into a structured data block of time, speaker and content type higher than the scheme explanations.
And (3) abutting against a conference multi-disc protocol standard, constructing a time node-content type-analysis dimension three-dimensional processing matrix, generating a standardized analysis rule set through a content-to-standard conversion model, and establishing a conference information processing dynamic updating mechanism. The method comprises the steps of interfacing meeting multi-disc protocol standards, constructing a three-dimensional matrix of 'time node-content type-analysis dimension' (e.g. "00:15-00:30-technical scheme-feasibility analysis"), generating standardized analysis rules through a content-to-standard conversion model (e.g. "automatically generating content abstracts every 15 minutes"), and establishing a dynamic update mechanism (e.g. automatically expanding the dimension of the matrix when new content types appear). The cloud service equipment builds a matrix for the quarter summary meeting, and specifically comprises time nodes of 00:00-00:15 (in the beginning) and 00:15-00:30 (performance report), content types of data report forms and scheme discussion, and analysis dimension of completion degree and risk points. The generation rule is as follows, "key indexes (such as the increase rate of the revenue) in the data report are automatically extracted after the completion of the performance reporting stage, and a summary is generated".
And taking the conference period as a time window, fusing time nodes, content types, analysis dimensions and historical conference processing records, constructing a multidimensional conference feature matrix, and realizing full-dimension association of conference information processing. And taking the conference period as a time window, fusing time nodes, content types, analysis dimensions and historical conference records (such as analysis modes of similar conferences in the past), constructing a multidimensional feature matrix (row: time, column: content types, depth: historical similarity), and realizing full-dimension association. The "product iteration discussion" link in the historical conference is often accompanied by "risk assessment" analysis, and when the current conference carries out product iteration discussion in the range of 01:00-01:15, the cloud service equipment invokes a historical strategy from the matrix, automatically adds "risk assessment" in the analysis dimension, and associates a historical risk case (such as "2025 years Q1 iteration delay risk").
And supplementing the deletion analysis rule through the content prediction model, extracting a key strategy by combining a time axis dynamic adjustment mechanism, and fusing with the three-dimensional processing matrix characteristics to generate initial conference scene mode information containing conference outline, PPT and mind map. The method comprises the steps of supplementing a missing analysis rule (when a content type is newly added and developed, an iteration period is required to be added for prediction) by using a content prediction model, extracting a key strategy by combining a time axis dynamic adjustment mechanism (such as automatically compressing analysis time of non-key content after conference delay), and fusing the key strategy with a three-dimensional matrix to generate a conference outline, a PPT and a thinking guide graph. And temporarily adding an AI technology floor discussion link, wherein the AI technology discussion needs to contain 'technology maturity' and 'analysis dimension' of cost budget. In the final fused generated mind map, the link is split into a branch of 'technical principle- & gt maturity assessment- & gt budget planning', and the cost data of the historical AI project are associated.
The cloud service equipment realizes the time-sequence intelligent processing of the conference information through a closed-loop flow of time axis analysis, matrix construction, characteristic association and strategy generation. For example, in an annual strategic meeting, cloud service equipment integrates the opening deliver a speech (00:00-00:15) of CEO and the planning report (00:15-01:30) of the general supervision of each department on a time axis, marks content types such as "strategic targets" and "resource allocation" through a three-dimensional matrix, and complements missing analysis dimensions (such as "market environment change" influence assessment) by combining analysis modes (such as resource allocation proportion of the next-year strategic meeting) of similar content in the historical meeting, so as to finally generate meeting outline, PPT and thinking guide diagram containing time node abstract, content association relation and historical reference data, and provide structured initial scene mode information for subsequent cloud depth analysis.
S205, processing the conference information in the target format and the initial conference scene mode information based on the priority processing strategy of the multi-mode information, and generating conference information data subjected to priority ordering and strategy processing.
In one embodiment, feature extraction and statistical analysis are performed on the target format conference information and the initial conference scene mode information to generate information type feature information, sensitivity degree feature information, conference role feature information, mode distribution feature information, history priority processing feature information and multi-mode association feature information. The cloud service equipment performs feature extraction on received target format meeting information and initial scene mode information, wherein the feature comprises information type features, sensitivity degree features and modal distribution features, wherein the information type features comprise recognition of types such as text (meeting summary), voice (speaking fragment), video (PPT sharing), the sensitivity degree features comprise evaluation of sensitivity level through keyword matching (such as financial data and password) and role authority, and the modal distribution features comprise statistics of the volume proportion of each modal data (such as voice accounting for 60% and video accounting for 30%).
In a annual budget meeting of a certain financial institution, the cloud service equipment extracts meeting information in a target format to comprise a text of a quarter profit table (the sensitivity level is 5), CFO speech (the modal distribution is 25%), and a thinking guide diagram in initial scene mode information comprises a node of risk assessment (the information type is an analysis report). And processing the information type characteristic information, the sensitivity degree characteristic information, the conference role characteristic information, the modal distribution characteristic information, the history priority processing characteristic information and the multi-modal associated characteristic information to generate information priority prediction information, modal sensitivity degree evaluation information, information type associated characteristic information and multi-modal fusion degree evaluation information. The method comprises the steps of information priority prediction, priority calculation (such as 5-level sensitivity+CEO role- & gt priority 90%) by combining the sensitivity degree and role authority, multi-mode fusion degree evaluation, and analysis of time synchronism of voice-text-video (such as high fusion degree when the time difference between voice and PPT page turning is less than or equal to 1 second). When the budget meeting information is processed, the cloud service equipment calculates the text of the 'quarter profit table', the priority is predicted to be 95% due to the 5-level sensitivity and the associated CFO role, the time difference between the text and the CFO speaking voice is detected to be 0.8 seconds, and the multi-mode fusion degree is evaluated to be 'high' (the score is 85/100).
Marking and screening abnormal data in the target format conference information and the initial conference scene mode information based on the information priority prediction information, the mode sensitivity degree evaluation information, the information type association characteristic information and the multi-mode fusion degree evaluation information, and generating a priority abnormal data screening result which comprises the type, the mode and the conference role of the abnormal information, the abnormal occurrence time and the abnormal degree. Based on the evaluation result, abnormal data such as low sensitive information is encrypted by error (type abnormal), and modal data is lost (such as video clip damage). If it is found that a certain common speaking voice (the sensitivity level is 2) is mistakenly encrypted by using AES-256 (the normal value is AES-128), the cloud service equipment marks that the abnormality type is "encryption level error", the occurrence time is 30 minutes of the conference, and the abnormality level is "medium" (the storage efficiency is affected).
And integrating and quantifying the screening result of the priority abnormal data to generate a priority influence factor, wherein the priority influence factor is used for representing the influence weight of the information type on the priority, the processing difficulty of the mode, the effectiveness of the history priority strategy and the influence trend of the future priority processing, and further generating the conference information data after the priority ordering and the strategy processing. The abnormal data is integrated to generate priority influencing factors (such as encryption level error influencing factors 0.6), and the priority queues are generated according to the order of sensitivity degree, role weight and fusion degree. After all exceptions in the conference are integrated, a priority influence factor of 0.4 (low influence) is generated, and the final sequencing result is that CFO financial data (priority of 95%) > CEO strategic speaking (80%) > common participant discussion (50%), the cloud service equipment reschedules resources according to the queue and preferentially processes high-priority data.
The cloud service equipment realizes intelligent processing of the multi-mode conference information through a closed loop of feature extraction, evaluation analysis, abnormal screening and priority ordering. In a cross-country purchase meeting, cloud service equipment firstly extracts legal agreement text (the sensitivity level is 5), CEO English speaking (the modal distribution is 35%), and purchase flow chart video (the information type is visual data) and other features, calculates the time synchronism between the agreement text and the video (the fusion level is 90%) due to the fact that the agreement text relates to an intellectual property key word and an associated legal general supervision role, the priority level is predicted to be 98%, if the video segment is detected to be lost for 10 seconds (the abnormality level is serious) due to network transmission, an influence factor of 0.7 is generated, a retransmission mechanism is triggered, finally, encryption storage and translation of legal agreement are processed preferentially after the agreement is ordered according to the priority level, real-time performance and safety of key information are ensured, and a ordered data basis is provided for subsequent cross-meeting retrieval and multi-modal fusion.
S206, carrying out semantic analysis, feature fusion and cross-mode association processing on the conference information data to generate target conference scene mode information.
In one embodiment, the cloud service device performs semantic analysis on conference information data (including voice, text and video) after priority processing, and the cloud service device comprises voice semantic extraction, namely converting voice into words through an ASR technology, extracting keywords (such as 'Q2 revenue growth'), text semantic analysis, namely analyzing conference summary by using a BERT model, identifying entity relations (such as 'research and development part- & gt budget 20%'), video semantic understanding, namely extracting PPT words through OCR, and analyzing chart meanings (such as bar charts representing performance of departments) in combination with image recognition. In a product release meeting of a certain technology company, a cloud service device analyzes CEO speech that a new product AI chip is produced in Q3 in quantity, extracts keywords of AI chip and Q3 in quantity, analyzes a chip parameter chart (OCR recognition of 200 TOPS) in PPT video, and establishes semantic association of AI chip, 200TOPS of the computing power and 2025 third quarter measurement.
The method comprises the steps of synchronizing a voice text, PPT page turning time and a video frame to a unified time axis (error is less than or equal to 500 ms), converting a voice keyword, a text entity and a video OCR result to a vector with unified dimension (such as 300 dimensions), calculating weights (40% of voice, 35% of text and 25% of video) through an attention mechanism, and generating a fusion feature vector. In the conference, the CTO speaks "AI algorithm optimization is carried out to improve the recognition rate to 95%" (voice), meanwhile, the PPT is turned to a "technical scheme" page (video), and the text summary records "algorithm optimization- > recognition rate 95%". The cloud service equipment aligns the three time stamps (00:25:10) to generate a fusion feature vector: [ AI algorithm, optimization, recognition rate of 95%, technical scheme ], and weight distribution of 0.4 voice, 0.35 text and 0.25 video.
The method comprises the steps of finding out the association between modes (such as 'financial data PPT- & gtCFO speaking- & gtsensitive encryption') through an association rule algorithm (such as Apriori), constructing an association network by using cross-mode entities (such as 'product manager', 'demand document', 'development progress') based on a knowledge graph technology, and supporting path inquiry (such as 'demand change- & gtinfluence module- & gtresponsible person'). The cloud service equipment analyzes 'user feedback jamming' (00:10:00), text summary 'jamming problem' in the conference of the market part, technical part solution '(00:10:15) and PPT label' optimization scheme 'and lower week online' (00:10:30). And generating a map after the association processing, namely, user feedback, a katon problem, a technical part, an optimization scheme and online time, so as to form a complete problem solving chain. The integrated semantic network is converted into a JSON format (comprising a time axis, entity relations and modal weights), and a target scene mode with labels (such as ' product review meeting_v2.0 ') is generated based on a historical conference mode library (such as ' product review meeting ', financial budget meeting ') and matching with the current scene type.
S207, confirming the target application program based on the target conference scene mode information, and sending target format instruction information to the target application program for processing by the target application program.
In one embodiment, feature extraction and analysis are performed on the target conference scene mode information to generate scene semantic feature information, application matching feature information, platform protocol compatible feature information, scene type distribution feature information, conference application duty ratio feature information and device interface parameter feature information. The cloud service equipment performs feature extraction on the target conference scene mode information, wherein the feature extraction comprises scene semantic features (such as semantic keywords of a technical scheme review), application matching features (such as features for adapting conference software such as Teams and Zoom), platform protocol compatibility features (such as HTTP, webSocket protocol support conditions) and the like.
The target scene mode information is displayed as a 'quarter financial review meeting', the cloud service equipment extracts semantic keywords of 'financial data', 'budget report', application matching features point to 'enterprise WeChat conference' (the software is commonly used for financial conference due to historical data), and platform protocol compatibility features are 'supporting HTTPS encrypted transmission'. And processing the scene semantic feature information, the application matching feature information, the platform protocol compatibility feature information, the scene type distribution feature information, the conference application duty ratio feature information and the equipment interface parameter feature information to generate scene matching probability information, application compatibility evaluation information, scene type association feature information and interface parameter adaptation evaluation information. The method comprises the steps of processing the extracted features, calculating the matching probability of a scene and an application (such as a neural network model trained based on historical data), and evaluating application compatibility (such as whether an application supports the encryption format of conference records) and interface parameter suitability (such as whether video resolution is matched with device output).
The cloud service equipment calculates that the matching probability of the quarter financial review meeting and the enterprise micro letter meeting is 92%, and the application supports AES-256 encryption (compatible with the encryption format of the meeting record), and 1080p video output (85% of parameter matching degree) of the interface parameter adaptation equipment generates a high compatibility evaluation result. Based on scene matching probability information, application compatibility evaluation information, scene type association characteristic information and interface parameter adaptation evaluation information, abnormal data in the target conference scene mode information are marked and screened, and scene abnormal data screening results are generated, wherein the scene abnormal data screening results comprise types of abnormal scenes, application scenes, interface links, abnormal occurrence moments and abnormal degrees. Based on the suitability evaluation result, the abnormal data in the target scene is marked, such as encryption protocols which are not supported by application, interface parameter conflicts and the like, and an abnormal data screening result (including abnormal types, occurrence moments and the like) is generated. If the target scene mode information requires "real-time subtitle generation", but the cloud service equipment detects that the target application program (such as a certain old version of video software) does not support the function, the type of the marked abnormality is "function deficiency", the occurrence time is "conference start time", and the degree of the abnormality is "serious" (the conference efficiency is affected).
And integrating and quantifying the scene abnormal data screening result to generate a scene execution influence factor, wherein the scene execution influence factor is used for representing the influence weight of the scene type on execution, the adaptation difficulty of an application scene, the compatibility of an equipment interface and the influence trend of future scene execution, further confirming a target application program based on the target conference scene mode information, and sending target format instruction information to the target application program for processing by the target application program. Integrating the abnormal data screening result, quantitatively generating a scene execution influence factor (for example, the influence weight of the function deficiency on the conference is 0.7), confirming a target application program (preferably selecting the application with the lowest influence factor) based on the influence factor, and sending instruction information. The cloud service equipment evaluates that the scene execution influence factor of the enterprise WeChat is 0.3 (low influence), and the influence factor of the other alternative application due to the problem of interface compatibility is 0.8, finally confirms to use the enterprise WeChat, and sends target format instruction information containing encrypted conference records, and adaptation parameters (such as video coding format H.264 and audio sampling rate 44.1 kHz) are attached to the instruction.
The cloud service equipment realizes intelligent adaptation of the conference scene and the application program through a closed loop flow of feature extraction, adaptation evaluation, anomaly screening and quantitative decision. For example, in a "new product release meeting" scenario, the cloud service device first extracts the semantic features of the scenario ("product demonstration", "multi-platform live broadcast"), applies the matching features (adapting the jittering live broadcast, the WeChat video signal, etc.), calculates the matching probability of each application (the jittering live broadcast matching degree is 95%), detects that the jittering live broadcast supports the RTMP push protocol (is compatible with the device interface), but finds that the encryption level is insufficient (the abnormal type "security risk"), and the quantization influence factor is 0.5 (medium influence). At the moment, the cloud service equipment starts an alternative scheme, selects an enterprise live broadcast platform (influence factor 0.2) with 85% matching degree and high encryption level, finally confirms the platform and sends instruction information, and ensures the safety and compatibility of meeting records in transmission and storage links.
In another embodiment, the cloud service device creates an encrypted shared space based on the target conference scene mode information, supports configuring access rights (e.g., read-only, download, edit) in a third party role (e.g., partner, client), and associates encryption policies (e.g., AES-256 encryption+device SN binding) of the conference video. After a technical company and an external provider hold a technical review meeting, the cloud service equipment creates a shared space named as '2025Q 2 chip review', configures 'read-only' permission for a provider representative (only meeting video abstracts can be checked), configures 'download+mark' permission for an internal technical team of the company, and automatically inherits role encryption rules when the meeting is recorded (for example, a segment related to a core technology needs provider responsible person to perform secondary authentication and decryption).
And establishing an encryption transmission channel through the TLS1.3 protocol to ensure that the video is not tampered or stolen in the sharing process. When the cloud service equipment shares the conference video to the telecommunication enterprise WeChat, firstly, the enterprise qualification (such as domain name binding and administrator authorization) of the enterprise WeChat account is verified through an OAuth2.0 protocol, then the encrypted video stream (resolution 1080p and code rate 2 Mbps) is transmitted through a TLS1.3 channel, and an SHA-256 hash check code is attached to each 10MB data block in the transmission process, so that the data integrity is ensured.
The conference video is stored in time axis slices (for example, one slice is formed every 15 minutes), accessible slice indexes are dynamically generated according to the third party authority, and real-time revocation authority (for example, sensitive content is found after sharing, and the third party access is immediately stopped) is supported. In a cross-country purchase conference, the cloud service equipment divides a conference video for 8 hours into 32 fragments, opens access rights of related fragments (04:30-06:00) for legal consultants in a purchase term discussion, automatically hides other fragments, and if a certain fragment is found to be related to unpublished financial data, the cloud service equipment updates a right strategy in real time to cancel the access link of the fragment.
And generating a sharing operation log, recording information such as third party access time, IP address, video clip access record, downloading times and the like, and meeting the compliance requirements such as GDPR, and the like, namely 2.0. The cloud service equipment records the sharing operation of a certain bank client on the video of the 'annual wind control conference', 2025, 6, 26, 15:30, and a third party (XX accountant office) accesses the video segments 02:15-02:30 (the content is a risk assessment report) through IP192.168.1.100, downloads for 1 time, and automatically encrypts and stores the log (using a bank exclusive key) for subsequent audit tracing. According to the technical standards of a third party platform (such as RTMP push stream of jittering live broadcast and MP4 format requirement of Zoom), the video coding format is automatically converted, and a low-code-rate preview version (such as 720p and code rate 500 kbps) is generated for quick preview. When the cloud service equipment shares the conference video to the jittering enterprise number, the MP4 file coded by the H.264 is automatically converted into the TS format coded by the H.265, the RTMP push stream protocol of jittering is adapted, and a 3-minute preview segment (comprising conference highlight clip) is generated, so that the conference video can be conveniently and rapidly released by a third party platform.
The cloud service equipment realizes the safe sharing of the conference video to the appointed third party through a full-flow mechanism of space creation, identity verification, fragment authorization, log audit and format adaptation. The process not only ensures the security of data in transmission and storage links (such as AES-256 encryption and TLS channels), but also supports role-based fine-grained authority control and real-time authority management, is suitable for meeting data sharing scenes of enterprises, partners, clients and other third parties, and meets compliance audit requirements.
S208, locating similar decision fragments in the historical conference through a semantic search technology, and providing data support and processing capacity for cross-conference retrieval.
In one embodiment, semantic mapping and feature normalization are carried out on historical meeting decision fragments through a semantic analysis flow, and a decision similarity evaluation model and a cross-meeting association algorithm are introduced to realize structural integration of decision information. The cloud service equipment converts a historical conference decision fragment (such as a discussion record of whether to start an AI project) into a semantic vector through a semantic analysis flow, extracts keywords (such as the AI project, budget and risk) through a BERT pre-training model, and calculates Word vector similarity through Word2Vec (dimension 300). The decision segment of 'putting into 500 ten thousand development AI customer service systems' in the 2024Q 4 technical decision conference is analyzed, keyword vectors [ AI customer service, 500 ten thousand development and technical decision ] are extracted, semantic mapping is carried out on the keyword vectors and the decision of 'AI chip development budget' of the current conference, and the calculated similarity is 0.72.
The decision similarity evaluation model adopts a three-layer fully-connected neural network (300-dimensional input layer, 128-dimensional hidden layer and 64-dimensional output layer, 1-dimensional similarity score), the loss function is cosine similarity loss, the learning rate is 0.001, and the iteration times are 10000. The input layer of the model is word vector + meeting metadata (time, role, department), high-order semantic features are extracted through a ReLU activation function, and finally similarity scores (more than or equal to 0.6 are high similarity) between 0 and 1 are output. The similarity between the model evaluation historical decision 'AI customer service investment' and the current 'AI chip research and development' is 0.68, and cross-conference association is triggered due to sharing of 'AI technology' and 'budget investment' semantic features.
And docking the conference semantic retrieval protocol standard, constructing a conference theme-decision type-keyword three-dimensional index matrix, generating a standardized retrieval index library through a semantic docking conversion model, and establishing a retrieval data updating mechanism. The semantic search protocol is docked, a three-dimensional index matrix (row: conference theme, column: decision type, depth: keyword) is constructed, for example, theme dimension: technical decision, financial budget, product planning, type dimension: resource investment, risk assessment, scheme type selection, keyword dimension: AI, budget, development period. An index is built for an '2024 AI customer service decision meeting', wherein the index items (technical decision, resource investment, AI customer service) are generated and stored in an HBase index table according to the theme, technical decision, type, resource investment, keywords, AI customer service and 500 ten thousand development periods. Based on the reverse index model of Lucene, the distributed search is realized by combining with the elastic search, and the index updating frequency is real-time increment updating (full refreshing every 10 minutes). The word segmentation device comprises an IKAnalyzer (supporting Chinese semantic word segmentation), an index segmentation number of 6 pieces (3 main pieces and 3 standby pieces), a PB level data storage support, and a retrieval delay of millisecond level (99% query is less than or equal to 50 ms).
And taking the meeting life cycle as a time window, fusing the meeting theme, the decision type, the keywords and the history retrieval record, constructing a multidimensional semantic feature matrix, and realizing the full-dimensional association of decision information. Taking a meeting life cycle (such as standing to settling) as a time window, fusing topics (such as 'annual budget'), types (such as 'fund allocation'), keywords (such as 'development expenses'), and historical search records (such as 10 times a keyword is searched), and constructing an N multiplied by M multiplied by P feature matrix (N=number of topics, M=number of types, and P=number of keywords). The method combines the features of the ' 2024Q 4 budget meeting ' and the ' 2025Q 1 chip meeting ', wherein the topics are technical budgets ', the types are fund distribution, keywords share the ' development expenses ', the ' chips ', and the weight of the corresponding positions in the matrix is increased (for example, from 0.5 to 0.7).
The semantic complement model is based on a generation model (6-layer encoder and 6-layer decoder) of a transducer, and is used for inputting decision fragments with missing features and outputting the completed semantic vectors. The word embedding dimension is 768, the attention head number is 12, the feedforward neural network dimension is 3072, the training data is 10 ten thousand historical decision fragments, and the BLEU-4 score is more than or equal to 0.85. The characteristic of 'budget amount' is missing from a certain historical decision fragment, the model is completed into '800 ten thousand' of budget according to the context 'AI project', '12 months of development period', and the similarity between the characteristic after completion and the '1000 ten thousand' of the current meeting 'chip development budget' is improved by 0.2.
And supplementing missing decision features through a semantic supplementing model, extracting key semantic information by combining a weight distribution mechanism, fusing the key semantic information with the three-dimensional index matrix features, generating a retrieval result data set in a standardized format, and providing data support and processing capacity for cross-conference retrieval. And extracting key semantics through a weight distribution mechanism (topic weight 0.4, type weight 0.3 and keyword weight 0.3), and fusing with a three-dimensional index matrix to generate a standardized retrieval result (containing a decision fragment original text, a similarity score and an associated label). When searching the AI chip budget, the keyword weight of the chip after weight distribution is 0.35 (higher than the default 0.3), the keyword weight is matched with the history decision AI customer service budget 500 ten thousand, the similarity is 0.68, and the result is labeled with the associated technical budget decision and the reference resource distribution proportion is suggested.
The cloud service equipment realizes cross-meeting decision retrieval through a closed loop of semantic analysis, index construction, feature fusion and result generation. In the current AI chip research and development decision conference, cloud service equipment, namely analyzing a decision fragment, inputting 1000 thousands of research and development AI chips as semantic vectors, calculating the similarity between the analysis decision fragment and a historical AI customer service budget decision by using a similarity model to obtain 0.68, searching a three-dimensional index matrix, positioning related records of a subject, such as a technical decision, a type, a resource input and a keyword AI, and supplementing missing features of the historical decision (such as supplementing technical risk assessment) by using a semantic supplementing model to construct a multidimensional feature matrix;
Finally, a standardized search result is generated, and a reference historical budget allocation proportion (500 ten thousand to 1000 ten thousand adjustment basis) is recommended to provide data support for the current decision. In the process, the three-layer neural network of the similarity model ensures the accuracy of semantic matching, the Transformer complement model improves the data integrity, the distributed index library ensures the retrieval efficiency, and a complete technical chain from data processing to application support is formed.
The system for recording and recording the intelligent conference comprises an edge device, a multi-mode information generation device, a cloud service device, a target conference scene information generation device and a target conference scene information generation device, wherein the edge device is used for acquiring audio information and video information;
the cloud service equipment receives multi-modal information sent by the edge equipment, the multi-modal information comprises text information, voice information and image information, processes the multi-modal information, combines context-aware error correction, content segmentation, dynamic segmentation strategies and preset storage strategies to generate conference record information, encrypts the conference record information based on the participant role information to generate target format conference information, processes the target format conference information to generate initial conference scene mode information containing conference outline, PPT and mind map, processes the target format conference information and the initial conference scene mode information based on the priority processing strategy of the multi-modal information to generate conference information data subjected to priority sorting and strategy processing, performs semantic analysis, feature fusion and cross-mode association processing on the conference information data to generate target conference scene mode information, confirms a target application program based on the target conference scene mode information, sends target format instruction information to the target application program for processing, and if matching is successful, sends the multi-modal information to the target hard disk for storing the text information, the image information and the image information.
The innovation point of the scheme is multi-dimensional technology integration. The cloud service equipment realizes context error correction through lip-moving videos and a theme word stock, combines dynamic segmentation such as PPT page turning and the like, ensures safety according to role encryption, and constructs a three-dimensional index matrix and a multi-dimensional feature matrix to realize semantic retrieval and multi-mode fusion. The system solves the problems of industry pain points, such as signal access limitation, recording authority management and the like, supports the access of mainstream video conference software, generates a multi-disc file through time axis driving, and has high efficiency in multi-platform distribution. By adopting an edge-cloud collaborative architecture, edge preprocessing guarantees instantaneity, cloud deep analysis improves intelligence, encryption strategy and equipment SN are bound to ensure data safety, and a complete technical system is provided for intelligent conference scenes.
A computing device comprising a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the device to perform any of the intelligent meeting recording and recording methods.
The methods and/or embodiments of the present application may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. The above-described functions defined in the method of the application are performed when the computer program is executed by a processing unit.
The computer readable medium according to the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (9)

1.一种智能会议记录与录制方法,应用于云服务设备,其特征在于,包括:1. An intelligent conference record and recording method, applied to a cloud service device, characterized by comprising: 接收边缘设备发送的多模态信息,多模态信息包括文字信息、语音信息和图像信息;Receive multimodal information sent by edge devices, including text information, voice information, and image information; 对多模态信息进行处理,结合上下文感知纠错、内容分段、动态分段策略和预设存储策略,生成会议记录信息;Process multimodal information and generate meeting record information by combining context-aware error correction, content segmentation, dynamic segmentation strategy and preset storage strategy; 基于参会人角色信息对会议记录信息进行加密处理,生成目标格式会议信息;Encrypt the meeting record information based on the participant role information to generate the target format meeting information; 对目标格式会议信息进行处理,生成包含会议大纲、PPT、思维导图的初始会议场景模式信息;Process the target format meeting information to generate initial meeting scenario mode information including meeting outline, PPT, and mind map; 基于多模态信息的优先级处理策略对目标格式会议信息及初始会议场景模式信息进行处理,生成经优先级排序和策略处理后的会议信息数据;The target format conference information and the initial conference scene mode information are processed based on the priority processing strategy of multimodal information to generate conference information data after priority sorting and strategy processing; 对会议信息数据进行语义分析、特征融合及跨模态关联处理,生成目标会议场景模式信息;Perform semantic analysis, feature fusion, and cross-modal association processing on conference information data to generate target conference scene mode information; 基于目标会议场景模式信息确认目标应用程序,将目标格式指令信息发送至目标应用程序,以供目标应用程序进行处理,包括对目标会议场景模式信息进行特征提取和分析,生成场景语义特征信息、应用匹配特征信息、平台协议兼容特征信息、场景类型分布特征信息、会议应用占比特征信息、设备接口参数特征信息;对场景语义特征信息、应用匹配特征信息、平台协议兼容特征信息、场景类型分布特征信息、会议应用占比特征信息、设备接口参数特征信息进行处理,生成场景匹配概率信息、应用兼容性评估信息、场景类型关联特征信息、接口参数适配评估信息;基于场景匹配概率信息、应用兼容性评估信息、场景类型关联特征信息、接口参数适配评估信息,对目标会议场景模式信息中的异常数据进行标记和筛选,生成场景异常数据筛选结果,包括异常场景的类型、应用场景、接口环节,异常发生时刻及异常程度;对场景异常数据筛选结果进行整合和量化处理,生成场景执行影响因子,其中,场景执行影响因子用于表征场景类型对执行的影响权重、应用场景的适配难度、设备接口的兼容性及未来场景执行的影响趋势,进而基于目标会议场景模式信息确认目标应用程序,将目标格式指令信息发送至目标应用程序,以供目标应用程序进行处理;Confirm the target application based on the target conference scene mode information, and send the target format instruction information to the target application for processing by the target application, including feature extraction and analysis of the target conference scene mode information, generating scene semantic feature information, application matching feature information, platform protocol compatibility feature information, scene type distribution feature information, conference application ratio feature information, and device interface parameter feature information; process the scene semantic feature information, application matching feature information, platform protocol compatibility feature information, scene type distribution feature information, conference application ratio feature information, and device interface parameter feature information to generate scene matching probability information, application compatibility evaluation information, scene type association feature information, and interface parameter adaptation evaluation information; based on the scene Matching probability information, application compatibility evaluation information, scenario type associated feature information, interface parameter adaptation evaluation information, mark and filter abnormal data in the target conference scenario mode information, and generate scenario abnormal data screening results, including the type of abnormal scenario, application scenario, interface link, abnormal occurrence time and abnormality degree; integrate and quantify the scenario abnormal data screening results to generate scenario execution impact factors, where the scenario execution impact factors are used to characterize the impact weight of the scenario type on the execution, the adaptation difficulty of the application scenario, the compatibility of the device interface and the impact trend of future scenario execution, and then confirm the target application based on the target conference scenario mode information, and send the target format instruction information to the target application for processing by the target application; 通过语义搜索技术,定位历史会议中的相似决策片段,为跨会议检索提供数据支持和处理能力。Through semantic search technology, similar decision-making fragments in historical meetings are located, providing data support and processing capabilities for cross-meeting retrieval. 2.根据权利要求1所述的智能会议记录与录制方法,其特征在于,对多模态信息进行处理,结合上下文感知纠错、内容分段、动态分段策略和预设存储策略,生成会议记录信息,包括:2. The intelligent conference record and recording method according to claim 1, characterized in that the multimodal information is processed and combined with context-aware error correction, content segmentation, dynamic segmentation strategy and preset storage strategy to generate conference record information, including: 对多模态信息进行采集与预处理,获取原始会议音视频数据;Collect and preprocess multimodal information to obtain original conference audio and video data; 结合唇动视频特征与会议主题词库对原始会议音视频数据进行处理,进行上下文感知纠错,生成处理后的会议音视频数据;The original conference audio and video data is processed by combining lip movement video features with the conference theme vocabulary, performing context-aware error correction to generate processed conference audio and video data; 基于PPT翻页、时间及发言人信息,对处理后的会议音视频数据进行自动分段标记,完成内容分段;Automatically segment and mark the processed conference audio and video data based on PPT page turning, time, and speaker information to complete content segmentation; 根据语音停顿、说话人切换信息自动切割时间轴段落,基于非连续时间轴标记技术标记静默片段并将非语音信息转化为文字标注;Automatically segment the timeline based on speech pauses and speaker switching, mark silent segments based on discontinuous timeline marking technology, and convert non-speech information into text annotations; 按照预设存储策略,将会议记录在设备加密芯片中进行分片存储,生成会议记录信息。According to the preset storage strategy, the meeting records are stored in fragments in the device encryption chip to generate meeting record information. 3.根据权利要求1所述的智能会议记录与录制方法,其特征在于,基于参会人角色信息对会议记录信息进行加密处理,生成目标格式会议信息,包括:3. The intelligent conference recording and recording method according to claim 1, wherein the method encrypts the conference record information based on the participant role information to generate the target format conference information, including: 通过角色解析流程对参会人角色信息进行权限映射和规则归一,引入角色敏感度评估模型与动态加密仲裁算法,实现加密策略的结构化处理;Through the role resolution process, permission mapping and rule normalization are performed on participant role information. A role sensitivity assessment model and dynamic encryption arbitration algorithm are introduced to achieve structured processing of encryption policies. 对接会议权限协议标准,构建角色-职级-部门三维权限矩阵,通过加密对标转换模型生成标准化加密规则集,建立加密策略动态更新机制;Connect with the meeting permission protocol standard, build a three-dimensional permission matrix of role-level-department, generate a standardized encryption rule set through the encryption benchmark conversion model, and establish a dynamic update mechanism for encryption policies; 以会议周期为时间窗口,融合参会人角色、职级、部门及历史加密处理记录,构建多维加密特征矩阵,实现加密策略的全维度关联;Using the meeting cycle as a time window, we integrate the participant roles, job levels, departments, and historical encryption processing records to build a multi-dimensional encryption feature matrix, achieving full-dimensional correlation of encryption strategies. 通过敏感预测模型补全缺失加密规则,结合权限动态调整机制提取关键加密策略,与三维权限矩阵特征融合,生成目标格式会议信息。Missing encryption rules are supplemented by the sensitive prediction model, and key encryption strategies are extracted by combining with the dynamic permission adjustment mechanism. The strategies are then integrated with the three-dimensional permission matrix features to generate meeting information in the target format. 4.根据权利要求3所述的智能会议记录与录制方法,其特征在于,对目标格式会议信息进行处理,结合时间轴驱动语音转文字以及分析处理,生成包含会议大纲、PPT、思维导图的初始会议场景模式信息,包括:4. The intelligent conference recording and recording method according to claim 3 is characterized by processing the target format conference information, combining timeline-driven speech-to-text conversion and analysis processing to generate initial conference scene mode information including a conference outline, PPT, and mind map, including: 通过时间轴解析流程对目标格式会议信息进行时序映射和规则归一,引入会议内容评估模型与动态分析仲裁算法,实现会议信息处理的结构化整合;Through the timeline parsing process, the target format meeting information is mapped to the time sequence and the rules are normalized. The meeting content evaluation model and dynamic analysis arbitration algorithm are introduced to achieve the structured integration of meeting information processing. 对接会议复盘协议标准,构建时间节点-内容类型-分析维度三维处理矩阵,通过内容对标转换模型生成标准化分析规则集,建立会议信息处理动态更新机制;Connect with the conference review protocol standards, build a three-dimensional processing matrix of time nodes, content types, and analysis dimensions, generate a standardized analysis rule set through the content benchmarking conversion model, and establish a dynamic update mechanism for conference information processing; 以会议周期为时间窗口,融合时间节点、内容类型、分析维度及历史会议处理记录,构建多维会议特征矩阵,实现会议信息处理的全维度关联;Using the meeting cycle as a time window, integrating time nodes, content types, analysis dimensions, and historical meeting processing records, a multi-dimensional meeting feature matrix is constructed to achieve full-dimensional correlation of meeting information processing; 通过内容预测模型补全缺失分析规则,结合时间轴动态调整机制提取关键策略,与三维处理矩阵特征融合,生成包含会议大纲、PPT、思维导图的初始会议场景模式信息。The missing analysis rules are completed through the content prediction model, and the key strategies are extracted by combining the dynamic adjustment mechanism of the timeline. They are then integrated with the three-dimensional processing matrix features to generate initial meeting scenario mode information including meeting outlines, PPTs, and mind maps. 5.根据权利要求1所述的智能会议记录与录制方法,其特征在于,基于多模态信息的优先级处理策略对目标格式会议信息及初始会议场景模式信息进行处理,生成经优先级排序和策略处理后的会议信息数据,包括:5. The intelligent conference recording and recording method according to claim 1, characterized in that the target format conference information and the initial conference scene mode information are processed based on the priority processing strategy of multimodal information to generate conference information data after priority sorting and strategy processing, including: 对目标格式会议信息及初始会议场景模式信息进行特征提取和统计分析,生成信息类型特征信息、敏感程度特征信息、会议角色特征信息、模态分布特征信息、历史优先级处理特征信息、多模态关联特征信息;Perform feature extraction and statistical analysis on the target format meeting information and initial meeting scene mode information to generate information type feature information, sensitivity feature information, meeting role feature information, modal distribution feature information, historical priority processing feature information, and multimodal association feature information; 对信息类型特征信息、敏感程度特征信息、会议角色特征信息、模态分布特征信息、历史优先级处理特征信息、多模态关联特征信息进行处理,生成信息优先级预测信息、模态敏感程度评估信息、信息类型关联特征信息、多模态融合度评估信息;Process information type feature information, sensitivity feature information, conference role feature information, modal distribution feature information, historical priority processing feature information, and multimodal association feature information to generate information priority prediction information, modal sensitivity assessment information, information type association feature information, and multimodal fusion assessment information; 基于信息优先级预测信息、模态敏感程度评估信息、信息类型关联特征信息、多模态融合度评估信息,对目标格式会议信息及初始会议场景模式信息中的异常数据进行标记和筛选,生成优先级异常数据筛选结果,包括异常信息的类型、模态、会议角色,异常发生时刻及异常程度;Based on information priority prediction information, modal sensitivity assessment information, information type association feature information, and multimodal fusion assessment information, abnormal data in the target format meeting information and initial meeting scene mode information are marked and filtered, and priority abnormal data screening results are generated, including the type, modality, meeting role, abnormal occurrence time, and abnormality degree of the abnormal information; 对优先级异常数据筛选结果进行整合和量化处理,生成优先级影响因子,其中,优先级影响因子用于表征信息类型对优先级的影响权重、模态的处理难度、历史优先级策略的有效性及未来优先级处理的影响趋势,进而生成经优先级排序和策略处理后的会议信息数据。The priority anomaly data screening results are integrated and quantified to generate priority impact factors. The priority impact factors are used to characterize the influence weight of information type on priority, the processing difficulty of the modality, the effectiveness of historical priority strategies, and the impact trend of future priority processing, thereby generating meeting information data after priority sorting and strategy processing. 6.根据权利要求1所述的智能会议记录与录制方法,其特征在于,通过语义搜索技术,定位历史会议中的相似决策片段,为跨会议检索提供数据支持和处理能力,包括:6. The intelligent conference recording and recording method according to claim 1 is characterized by locating similar decision-making fragments in historical meetings through semantic search technology, providing data support and processing capabilities for cross-meeting retrieval, including: 通过语义解析流程对历史会议决策片段进行语义映射和特征归一,引入决策相似度评估模型与跨会议关联算法,实现决策信息的结构化整合;Through the semantic parsing process, historical meeting decision fragments are semantically mapped and feature normalized. A decision similarity evaluation model and a cross-meeting association algorithm are introduced to achieve structured integration of decision information. 对接会议语义检索协议标准,构建会议主题-决策类型-关键词三维索引矩阵,通过语义对标转换模型生成标准化检索索引库,建立检索数据更新机制;Connect with the conference semantic retrieval protocol standard, build a three-dimensional index matrix of conference topic-decision type-keyword, generate a standardized retrieval index library through the semantic benchmarking conversion model, and establish a retrieval data update mechanism; 以会议生命周期为时间窗口,融合会议主题、决策类型、关键词及历史检索记录,构建多维语义特征矩阵,实现决策信息的全维度关联;Using the meeting lifecycle as a time window, we integrate meeting topics, decision types, keywords, and historical search records to construct a multi-dimensional semantic feature matrix, achieving full-dimensional correlation of decision information. 通过语义补全模型补全缺失决策特征,结合权重分配机制提取关键语义信息,与三维索引矩阵特征融合,生成标准化格式的检索结果数据集,为跨会议检索提供数据支持和处理能力。The semantic completion model is used to complete missing decision features, and the weight distribution mechanism is used to extract key semantic information. This information is then fused with the three-dimensional index matrix features to generate a retrieval result dataset in a standardized format, providing data support and processing capabilities for cross-conference retrieval. 7.一种智能会议记录与录制方法,应用于边缘设备,其特征在于,包括:7. An intelligent conference record and recording method, applied to an edge device, characterized by comprising: 获取音频信息和视频信息;Obtain audio information and video information; 结合时间轴对音频信息和视频信息进行处理,生成多模态信息,多模态信息包括文字信息、语音信息和图像信息;Processing audio information and video information in combination with the time axis to generate multimodal information, which includes text information, voice information and image information; 将多模态信息发送至云服务设备,以供云服务设备进行目标会议场景信息的生成,所述云服务设备用于执行如权利要求1-6中任一项所述的方法;Sending the multimodal information to a cloud service device for the cloud service device to generate target conference scene information, wherein the cloud service device is configured to execute the method according to any one of claims 1 to 6; 基于预设匹配规则与目标硬盘进行匹配处理,若匹配成功,则将多模态信息发送至目标硬盘,以供目标硬盘对文字信息、语音信息和图像信息进行存储。The multimodal information is matched with the target hard disk based on the preset matching rules. If the match is successful, the multimodal information is sent to the target hard disk for the target hard disk to store the text information, voice information and image information. 8.根据权利要求7所述的智能会议记录与录制方法,其特征在于,基于预设匹配规则与目标硬盘进行匹配处理,若匹配成功,则将多模态信息发送至目标硬盘,包括:8. The intelligent conference recording and recording method according to claim 7, wherein the method includes performing a matching process with a target hard disk based on a preset matching rule and sending the multimodal information to the target hard disk if the match is successful, comprising: 基于边缘设备的序列号信息与目标硬盘加密绑定的预设匹配规则,对目标硬盘进行序列号配对有效性验证,若匹配成功,则将多模态信息通过AES256加密通道发送至目标硬盘,同步生成加密传输日志。Based on the preset matching rules of the serial number information of the edge device and the encrypted binding of the target hard disk, the serial number pairing validity of the target hard disk is verified. If the match is successful, the multimodal information is sent to the target hard disk through the AES256 encrypted channel, and an encrypted transmission log is generated simultaneously. 9.一种智能会议记录与录制系统,其特征在于,包括:9. An intelligent conference record and recording system, comprising: 由边缘设备获取音频信息和视频信息;结合时间轴对音频信息和视频信息进行处理,生成多模态信息,多模态信息包括文字信息、语音信息和图像信息;将多模态信息发送至云服务设备,以供云服务设备进行目标会议场景信息的生成;The edge device obtains audio and video information; processes the audio and video information in combination with the timeline to generate multimodal information, which includes text information, voice information, and image information; and sends the multimodal information to the cloud service device for the cloud service device to generate target conference scene information. 由云服务设备接收边缘设备发送的多模态信息,多模态信息包括文字信息、语音信息和图像信息;对多模态信息进行处理,结合上下文感知纠错、内容分段、动态分段策略和预设存储策略,生成会议记录信息;基于参会人角色信息对会议记录信息进行加密处理,生成目标格式会议信息;对目标格式会议信息进行处理,生成包含会议大纲、PPT、思维导图的初始会议场景模式信息;基于多模态信息的优先级处理策略对目标格式会议信息及初始会议场景模式信息进行处理,生成经优先级排序和策略处理后的会议信息数据;对会议信息数据进行语义分析、特征融合及跨模态关联处理,生成目标会议场景模式信息;基于目标会议场景模式信息确认目标应用程序,将目标格式指令信息发送至目标应用程序,以供目标应用程序进行处理,包括对目标会议场景模式信息进行特征提取和分析,生成场景语义特征信息、应用匹配特征信息、平台协议兼容特征信息、场景类型分布特征信息、会议应用占比特征信息、设备接口参数特征信息;对场景语义特征信息、应用匹配特征信息、平台协议兼容特征信息、场景类型分布特征信息、会议应用占比特征信息、设备接口参数特征信息进行处理,生成场景匹配概率信息、应用兼容性评估信息、场景类型关联特征信息、接口参数适配评估信息;基于场景匹配概率信息、应用兼容性评估信息、场景类型关联特征信息、接口参数适配评估信息,对目标会议场景模式信息中的异常数据进行标记和筛选,生成场景异常数据筛选结果,包括异常场景的类型、应用场景、接口环节,异常发生时刻及异常程度;对场景异常数据筛选结果进行整合和量化处理,生成场景执行影响因子,其中,场景执行影响因子用于表征场景类型对执行的影响权重、应用场景的适配难度、设备接口的兼容性及未来场景执行的影响趋势,进而基于目标会议场景模式信息确认目标应用程序,将目标格式指令信息发送至目标应用程序,以供目标应用程序进行处理;边缘设备基于预设匹配规则与目标硬盘进行匹配处理,若匹配成功,则将多模态信息发送至目标硬盘,以供目标硬盘对文字信息、语音信息和图像信息进行存储。The cloud service device receives multimodal information sent by the edge device, and the multimodal information includes text information, voice information and image information; processes the multimodal information, combines context-aware error correction, content segmentation, dynamic segmentation strategy and preset storage strategy to generate meeting record information; encrypts the meeting record information based on the role information of the participants to generate the target format meeting information; processes the target format meeting information to generate the initial meeting scene mode information including the meeting outline, PPT and mind map; processes the target format meeting information and the initial meeting scene mode information based on the priority processing strategy of the multimodal information to generate the encrypted Conference information data after priority sorting and strategy processing; semantic analysis, feature fusion and cross-modal association processing of conference information data to generate target conference scene mode information; confirm the target application based on the target conference scene mode information, and send the target format instruction information to the target application for processing by the target application, including feature extraction and analysis of the target conference scene mode information to generate scene semantic feature information, application matching feature information, platform protocol compatibility feature information, scene type distribution feature information, conference application ratio feature information, device interface parameter feature information; scene semantic feature information, application matching feature information, The feature information, platform protocol compatibility feature information, scenario type distribution feature information, conference application ratio feature information, and device interface parameter feature information are processed to generate scenario matching probability information, application compatibility evaluation information, scenario type association feature information, and interface parameter adaptation evaluation information; based on the scenario matching probability information, application compatibility evaluation information, scenario type association feature information, and interface parameter adaptation evaluation information, the abnormal data in the target conference scenario mode information is marked and filtered to generate scenario abnormal data screening results, including the type of abnormal scenario, application scenario, interface link, abnormal occurrence time, and abnormality degree; the scenario abnormal data screening results are integrated and quantified to generate a scenario execution impact factor, where the scenario execution impact factor is used to characterize the impact weight of the scenario type on the execution, the adaptation difficulty of the application scenario, the compatibility of the device interface, and the impact trend of future scenario execution; then, the target application is confirmed based on the target conference scenario mode information, and the target format instruction information is sent to the target application for processing by the target application; the edge device matches the target hard disk based on the preset matching rules. If the match is successful, the multimodal information is sent to the target hard disk for the target hard disk to store text information, voice information, and image information.
CN202511068063.5A 2025-07-31 2025-07-31 Intelligent conference recording and recording method and system Active CN120568005B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202511068063.5A CN120568005B (en) 2025-07-31 2025-07-31 Intelligent conference recording and recording method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202511068063.5A CN120568005B (en) 2025-07-31 2025-07-31 Intelligent conference recording and recording method and system

Publications (2)

Publication Number Publication Date
CN120568005A CN120568005A (en) 2025-08-29
CN120568005B true CN120568005B (en) 2025-09-26

Family

ID=96821934

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202511068063.5A Active CN120568005B (en) 2025-07-31 2025-07-31 Intelligent conference recording and recording method and system

Country Status (1)

Country Link
CN (1) CN120568005B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117235818A (en) * 2023-10-11 2023-12-15 紫光计算机科技有限公司 Encryption authentication method and device based on solid state disk, computer equipment and medium
CN120179079A (en) * 2025-05-20 2025-06-20 厦门视诚科技有限公司 AI multimodal fusion interaction method, device, system and equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011136787A1 (en) * 2010-04-30 2011-11-03 American Teleconferencing Services, Ltd. Conferencing application store
CN114817451A (en) * 2022-05-26 2022-07-29 山东数元信息技术有限公司 Conference summary processing system and method
CN120075203B (en) * 2025-04-27 2025-08-22 山东浪潮科学研究院有限公司 Conference enhancement method, device, equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117235818A (en) * 2023-10-11 2023-12-15 紫光计算机科技有限公司 Encryption authentication method and device based on solid state disk, computer equipment and medium
CN120179079A (en) * 2025-05-20 2025-06-20 厦门视诚科技有限公司 AI multimodal fusion interaction method, device, system and equipment

Also Published As

Publication number Publication date
CN120568005A (en) 2025-08-29

Similar Documents

Publication Publication Date Title
US11916913B2 (en) Secure audio transcription
JP7536789B2 (en) Customized output to optimize for user preferences in distributed systems
US10586541B2 (en) Communicating metadata that identifies a current speaker
US9495967B2 (en) Collaborative audio conversation attestation
US11190840B2 (en) Systems and methods for applying behavioral-based parental controls for media assets
CN110516083B (en) Album management method, storage medium and electronic device
WO2023029984A1 (en) Video generation method and apparatus, terminal, server, and storage medium
US20230388416A1 (en) Emergency communication system with contextual snippets
US20220191430A1 (en) Systems and methods for application of context-based policies to video communication content
US20190294804A1 (en) Encrypted recordings of meetings between individuals
US20250013893A1 (en) Dynamic generative-ai enhanced leadership coaching platform
CN118612378B (en) Multifunctional video conference interaction method, device, equipment and storage medium
US12216708B2 (en) Digital media authentication
CN120568005B (en) Intelligent conference recording and recording method and system
CN112883350B (en) Data processing method, device, electronic equipment and storage medium
CN117935062A (en) Village image collection and evaluation system and village image collection and evaluation method
US11011174B2 (en) Method and system for determining speaker-user of voice-controllable device
US20230370503A1 (en) Dynamic group session data access protocols
US12125479B2 (en) Systems and methods for providing a sociolinguistic virtual assistant
CN116708055A (en) Intelligent multimedia audiovisual image processing method, system and storage medium
CN114339132A (en) Intelligent meeting minutes method, device and computer equipment for video conferencing
US20250315630A1 (en) Automatic prompt generation based on a meeting discussion
CN120602691B (en) Conference live broadcast method and system
US20250317533A1 (en) Intelligent categorization and organization of a virtual meeting resource based on a meeting discussion
CN120547367B (en) AI workflow processing method and system for intelligent hardware

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant