Secure Big Data
Secure Big Data
Enterprises or organizations have deployed a large number of security detection devices, protection devices, and auditing devices. These hardware and software deployed in bypass, in series, or on servers and terminals continuously generate various kinds of audit data like alarm data and original operations, which cannot be stored in the data devices for a long time. Many enterprises require direct collection and storage of data such as logs and services for key protection systems or services. In recent years, many enterprises or organizations have collected, stored, and analyzed raw data after parsed by the stream protocol, such as DNS and Http access with large amount.
As Hierarchy Protection 2.0 requires a longer log storage period, enterprises' demands for long-term data analysis, protocol deep analysis, and multiple data deep association analysis increase, and so do the demands for the collection, storage organization and analysis of security-related big data.
Introduction of the Technology
The Leadsec security big data technology is developed for the whole process of security big data processing, covering data collection, data access, data pre-processing, data organization, and data services. It adopts big data distributed cluster technology.
Security data extraction
Performing parsing and mapping of raw data information to ensure the consistency and integrity of data extraction. Supporting the extraction of structured data and unstructured data.
Security data cleaning
Eliminating the useless data and single out the desired data. Supporting data filtering, unified identification, and filling of vacancies.
Security data association
The data is associated with other data such as association information, knowledge data, etc. through association rules or algorithms. Supporting association backfill and association extraction.
Security data comparison
Comparing the cleaned or associated data with other data to identify the desired data set. Supporting feature comparison, range comparison, and regular comparison.
Security data identification
Attaching corresponding labels to the data processing result sets. Supporting basic attribute tags, behavior tags, alarm tags, and custom tags.
Security data distribution
The raw security big data forms a standard security big data set after pre-processed by extraction, cleaning, correlation, comparison, identification, etc. Such data is output to various data storage organizations and calculation engines as the predetermined rules and interfaces according to different security scene requirements of customers, to achieve data distribution and sharing. Supporting relational database, distributed file storage system, distributed message bus, distributed full-text search, distributed column key value database.
The raw base is used to store raw data sets. It contains boundary security-related data, network security-related data, terminal security-related data, application security-related data, data security-related data, and security infrastructure-related data.
The resource base contains data sets of security elements, mainly including the personnel information base, organizational structure information base, equipment information base, business system information base, and so on.
The theme base contains data sets for various analysis objects, mainly for various kinds of maintenance image data for personnel, devices, business systems, and so on.
Data sets supporting security business, mainly including the vulnerability association base, risk management base, disposition response base, and threat event management base.
The knowledge dataset shared by security domains, mainly including the threat intelligence base, IP geographic information base, malicious code base, etc.
Advantages of the Scheme
The contents of the logs of different types of security devices vary greatly. Even for the similar products, there is a large difference in data format and content because of different manufacturers. Venustech itself is a full-line product provider. Based on the rich knowledge and experience in the security industry, it deeply understands the semantics of many security devices, combined with the understanding of the target customer's network environment, the original semantics of machine data such as streams and logs can be restored more accurately and quickly in terms of the security data governance, making the data more accurate.
It adopts the big data computing and storage technology based on distributed clusters, easy to expand the capacity.
Supporting for microservices architecture, heterogeneous system integration and scale-out.
Providing system component containerization. Supporting DevOps.
Providing the plug-in installation mechanism, supporting massive data collection and data processing, having expansion capabilities such as data adaptation, algorithm insertion and scheduling, and application instrumentation, and easy to achieve integration and expanding.
The data processing function is stable in operation as verified in very large customer production system.
Massive heterogeneous data access
Supporting the data like DIKI(D-Data network stream data, device logs, Web and application server logs; I-Information enterprise association information such as asset data, and vulnerability scan data; K-Knowledge security knowledge; I-Threat Intelligence threat intelligence) data collection access based on big data computing and storage technology; supporting the processing like data paradigm, filtering and cleaning, enrichment and labeling based on the security analysis; and providing automatic understanding and recognition of semantics for some security device alarm data, making data "clean and available", so that the quality of the data is guaranteed.
The typical applications of the security big data in enterprises include the security big data platform as a summary point for enterprise security data resources, and the scenarios with massive data such as the security big data platform of the Industrial Internet.
The typical applications in government and other institutions include the government security big data center, smart city security big data center, and so on.